常巧珍,曹隽喆,顾宏,李丹.基于最近邻区间的不完整基因表达数据多目标聚类算法[J].,2021,61(4): |
基于最近邻区间的不完整基因表达数据多目标聚类算法 |
Multi objective clustering algorithm based on the nearest neighbor interval for incomplete gene expression data |
|
DOI:10.7511/dllgxb202104011 |
中文关键词:基因表达数据缺失值多目标聚类最近邻规则 |
英文关键词:gene expression datamissing valuemulti-objective clusteringthe nearest neighbor rule |
基金项目:国家自然科学基金资助项目(81872247). |
|
摘要点击次数:247 |
全文下载次数:147 |
中文摘要: |
针对不完整基因表达数据的聚类问题,提出了一种多目标NSGA-Ⅱ框架下缺失值填补与聚类协同优化的算法.算法根据欧式距离确定不完整基因的近邻基因,以缺失值的最近邻区间为约束,采用混合编码将缺失值填补与聚类中心优化融入NSGA-Ⅱ进化过程,通过将数据集的统计信息与聚类结果共同作为缺失值填补因素,提升不完整基因表达数据的填补准确度及聚类性能.在多个基因表达数据集上的实验结果表明,所提算法得到了更接近真实表达值的填补结果及更紧凑的聚类效果,且聚类结果具有统计显著性. |
英文摘要: |
Aiming at the problem of clustering incomplete gene expression data, a collaborative optimization algorithm for missing value imputation and clustering is proposed in the framework of multi-objective NSGA-Ⅱ. The algorithm determines the neighbor genes of incomplete genes according to Euclidean distance. Constrained by the nearest neighbor interval of missing value, the algorithm combines missing value imputation with clustering center optimization into NSGA-Ⅱ by mixed encoding. Taking statistical information of datasets and the clustering results into account is helpful to improve the imputation accuracy and clustering performance. Experimental results on multiple gene expression datasets show that the proposed algorithm obtains an imputation result closer to the true expression value and a more compact clustering effect. Furthermore, the proposed algorithm proves to be statistically significant. |
查看全文查看/发表评论下载PDF阅读器 |
| --> 关闭 |