师豪杰, 顾宏, 徐晓璐, 秦攀.基于广义线性模型的基因表达水平预测[J].,2020,60(1):69-74 |
基于广义线性模型的基因表达水平预测 |
Prediction of gene expression level based on generalized linear model |
|
DOI:10.7511/dllgxb202001010 |
中文关键词:广义线性模型主从模型组蛋白修饰基因表达 |
英文关键词:generalized linear modelmaster-slave modelhistone modificationgene expression |
基金项目:国家自然科学基金资助项目(81872247). |
|
摘要点击次数:395 |
全文下载次数:659 |
中文摘要: |
组蛋白修饰是生物体中普遍存在的一种现象,能够以不同的调控方式影响基因表达,且随着高通量测序技术的飞速发展,大量的测序数据使得探究组蛋白修饰信号与基因表达水平之间的内在联系成为可能.由于基因表达数据存在零膨胀现象,提出了一种基于广义线性模型框架的主从模型,能够以较高精度从组蛋白修饰信号预测基因表达水平.首先通过人类全基因组注释文件中的基因位点信息,筛选出包含完整基因位点信息的表达数据;其次,根据基因位点信息,定位并提取出组蛋白修饰数据中基因特定位点的特征信息,构建设计矩阵;最后结合响应变量数据零膨胀的特点,构建主从模型,以GM12878细胞系为例,与现有的多种回归算法进行对比,验证了所提模型的有效性. |
英文摘要: |
Histone modification is a common phenomenon in organisms, which can affect gene expression in various ways. With the rapid development of high-throughput sequencing technology, adequate sequencing data make it possible to explore the relation between histone modification and gene expression level. A master-slave model based on the generalized linear model framework is proposed, which can predict gene expression levels from histone modification signals with high precision. Firstly, gene locus information from the human genome-wide annotation file is used to screen out the expression data which contain the complete gene locus information. Secondly, according to the gene locus information, the characteristics of the gene-specific locus in the histone modification data are located and extracted, and then the design matrix is constructed. Finally, combined with the zero-expansion characteristics of the response variable data, the master-slave model is constructed, then compared with the existing multiple regression algorithms by using the data of GM12878 cell line, the validity of the proposed model is proved. |
查看全文查看/发表评论下载PDF阅读器 |
| --> 关闭 |