删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets

本站小编 Free考研考试/2022-01-03

In gene expression profiling studies, including single-cell RNA sequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model (scLM), a gene co-clustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, scLM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. scLM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that scLM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of scLM, we apply it to our in-house and public experimental scRNA-seq datasets. scLM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the scLM method is available at https://github.com/QSong-github/scLM.
在基因表达谱研究中,包括单细胞RNA-seq(scRNA-seq)分析,共表达基因的鉴定和表征提供了有关细胞身份和功能的关键信息。目前而言,寻找单细胞RNA-seq数据中的基因共表达聚类存在一些挑战。我们表明,常用的单细胞数据方法不能准确识别共表达的基因,并且产生的结果大大限制了共表达基因的生物学期望。在本文中,我们提出了单细胞潜变量模型(scLM),一种针对单细胞数据量身定制的基因共聚类算法,该算法在检测具有重要生物学背景的共表达基因簇时表现优异。重要的是,scLM可以同时对多个单细胞数据集进行聚类,使用户能够利用来自多个来源的单细胞数据来进行比较分析。scLM将原始基因表达数据作为输入,并保留生物学差异,而不受多个数据集的批次影响。仿真数据和实验数据的结果都表明,scLM的性能优于现有方法,并且准确性大大提高。为了说明scLM可揭示潜在的生物学机理,我们将其应用于多个现有的scRNA-seq数据集。我们发现scLM可以识别具有重要功能的基因模块并改善细胞分类,从而有助于发现生物机制和理解复杂生物系统(例如癌症)。此外,我们在https://github.com/QSong-github/scLM提供了实现scLM方法的用户友好型R包。





PDF全文下载地址:

http://gpb.big.ac.cn/articles/download/858
相关话题/gen