删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

Evaluation of Cell Type Annotation R Packages on Single-cell RNA-seq Data

本站小编 Free考研考试/2022-01-03

Annotating cell types is a critical step in single-cell RNA sequencing (scRNA-seq) data analysis. Some supervised or semi-supervised classification methods have recently emerged to enable automated cell type identification. However, comprehensive evaluations of these methods are lacking. Moreover, it is not clear whether some classification methods originally designed for analyzing other bulk omics data are adaptable to scRNA-seq analysis. In this study, we evaluated ten cell type annotation methods publicly available as R packages. Eight of them are popular methods developed specifically for single-cell research, including Seurat, scmap, SingleR, CHETAH, SingleCellNet, scID, Garnett, and SCINA. The other two methods were repurposed from deconvoluting DNA methylation data, i.e., linear constrained projection (CP) and robust partial correlations (RPC). We conducted systematic comparisons on a wide variety of public scRNA-seq datasets as well as simulation data. We assessed the accuracy through intra-dataset and inter-dataset predictions; the robustness over practical challenges such as gene filtering, high similarity among cell types, and increased cell type classes; as well as the detection of rare and unknown cell types. Overall, methods such as Seurat, SingleR, CP, RPC, and SingleCellNet performed well, with Seurat being the best at annotating major cell types. Additionally, Seurat, SingleR, CP, and RPC were more robust against downsampling. However, Seurat did have a major drawback at predicting rare cell populations, and it was suboptimal at differentiating cell types highly similar to each other, compared to SingleR and RPC. All the code and data are available from https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark.
注释细胞类型是单细胞RNA测序(scRNA-seq)数据分析中的关键步骤。近年来,以实现自动细胞类型识别为目的出现了一些监督或半监督分类方法。但是,对这些方法使用的全面评估还存在欠缺。此外,某些最初设计用于分析其他组织或细胞群组学数据的分类方法在scRNA-seq分析的适用性尚不清楚。在这项研究中,我们评估了十种以R包形式公开提供的细胞类型注释方法。其中有八种流行方法是专门为单细胞研究开发的,包括Seurat,scmap,SingleR,CHETAH,SingleCellNet,scID,Garnett和SCINA。另外,我们从反卷积DNA甲基化数据的常用技术中重新利用了其他两种方法,即线性约束投影(CP)和鲁棒偏相关(RPC)。我们利用各种公共scRNA-seq数据集和模拟数据进行了系统比较。对于每个方法,我们评估了数据集内和数据集间的预测的准确性;应对诸如基因过滤,细胞类型之间的高度相似性以及增加的细胞类型类别等实际挑战的鲁棒性;以及对稀有和未知细胞类型的检测。总体而言,Seurat,SingleR,CP,RPC和SingleCellNet之类的方法效果良好,其中Seurat是注释主要细胞类型的最佳方法。此外,Seurat,SingleR,CP和RPC在应对基因过滤的鲁棒性方面更佳。然而,与SingleR和RPC相比,Seurat在预测稀有细胞种群方面有主要缺陷,并且在区分高度相似的细胞类型时表现欠佳。所有代码和数据都可以从https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark获得。





PDF全文下载地址:

http://gpb.big.ac.cn/articles/download/854
相关话题/gen