删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoenco

本站小编 Free考研考试/2022-01-03

Single-cell RNA sequencing (scRNA-seq) is a powerful technique to analyze the transcriptomic heterogeneities at the single cell level. It is an important step for studying cell sub-populations and lineages, with an effective low-dimensional representation and visualization of the original scRNA-Seq data. At the single cell level, the transcriptional fluctuations are much larger than the average of a cell population, and the low amount of RNA transcripts will increase the rate of technical dropout events. Therefore, scRNA-seq data are much noisier than traditional bulk RNA-seq data. In this study, we proposed the deep variational autoencoder for scRNA-seq data (VASC), a deep multi-layer generative model, for the unsupervised dimension reduction and visualization of scRNA-seq data. VASC can explicitly model the dropout events and find the nonlinear hierarchical feature representations of the original data. Tested on over 20 datasets, VASC shows superior performances in most cases and exhibits broader dataset compatibility compared to four state-of-the-art dimension reduction and visualization methods. In addition, VASC provides better representations for very rare cell populations in the 2D visualization. As a case study, VASC successfully re-establishes the cell dynamics in pre-implantation embryos and identifies several candidate marker genes associated with early embryo development. Moreover, VASC also performs well on a 10× Genomics dataset with more cells and higher dropout rate.
近年来,单细胞RNA测序技术(scRNA-seq)的迅速发展使得研究人员能够在单细胞层次上研究生物系统的转录异质性,这种信息通常难以通过传统的组学数据获得。然而,在单细胞层次上,转录组的随机波动会远远大于细胞群体的平均行为,另一方面,单个细胞的RNA总量极低,使得其准确测量极具挑战,因此目前的单细胞测序数据存在很大的噪声。其中,dropout现象是一种主要的噪声,即很多表达的mRNA没有被捕捉到,导致检测出来的表达量为0。有效的低维表示可以降低scRNA-seq数据中的噪声,从而使得我们能够更好的分析细胞类型与状态,并实现细胞分布的可视化展示。本研究中,我们提出了一种基于深度变分自编码器的scRNA-seq数据分析方法——VASC,有效实现scRNA-seq数据的非监督降维与可视化。VASC对dropout现象进行了建模,并通过深度神经网络发现数据中复杂的非线性模式、降低数据噪声,从而做到可靠的数据降维与可视化。我们在超过20个数据集上(包含目前主流的scRNA-seq技术,例如SMART-Seq,inDrop,10X等)测试了VASC的低维表示性能,结果表明在大多数数据集中,VASC都能更好的提取细胞类型或者细胞分化过程的信息,体现了VASC广泛的适应性。VASC可以通过[通过[https://github.com/wang-research/VASC]免费获]免费获得。





PDF全文下载地址:

http://gpb.big.ac.cn/articles/download/665
相关话题/gen