删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于多平台基因表达数据的水稻干旱和盐胁迫相关基因预测

本站小编 Free考研考试/2021-12-26

刘亚文,1, 张红燕,1,2,*, 曹丹2, 李兰芝21湖南农业大学信息与智能科学技术学院, 湖南长沙 410128
2湖南农业大学 / 湖南省农业大数据分析与决策工程技术研究中心, 湖南长沙 410128

Prediction of drought and salt stress-related genes in rice based on multi-platform gene expression data

LIU Ya-Wen,1, ZHANG Hong-Yan,1,2,*, CAO Dan2, LI Lan-Zhi21College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, Hunan, China
2Hunan Engineering and Technology Research Centre for Agricultural Big Data Analysis and Decision-making, Hunan Agricultural University, Changsha 410128, Hunan, China

通讯作者: * 张红燕, E-mail:hongyan_zhang@hunau.edu.cn

收稿日期:2020-11-30接受日期:2021-04-26网络出版日期:2021-06-02
基金资助:湖南省教育厅科学研究重点项目(18A105)
长沙市工业科技特派员项目(201845)
湖南农业大学“双一流”建设项目(SYL2019075)


Corresponding authors: * E-mail:hongyan_zhang@hunau.edu.cn
Received:2020-11-30Accepted:2021-04-26Published online:2021-06-02
Fund supported: Key Scientific Research Project of Hunan Education Department(18A105)
Special Commissioner Project of Changsha City for Industrial Science and Technology(201845)
“Double First-class” Construction Project of Hunan Agricultural University(SYL2019075)

作者简介 About authors
E-mail:lyw20201022@163.com



摘要
基于多平台基因表达数据挖掘水稻胁迫相关基因, 可增加关键基因预测的可靠性, 获得更具普适意义的结果。本研究从NCBI数据库中收集了与水稻非生物胁迫相关的94份affymetrix基因芯片数据和42份RNA-seq转录组数据。首先对同一类型同一胁迫相关的多个数据集以数据转换法融合, 得到干旱胁迫相关的affymetrix数据集D_affy和RNA-seq数据集D_rnaseq, 盐胁迫相关的affymetrix数据集S_affy和RNA-seq数据集S_rnaseq; 接着对4个数据集分别基于Pearson线性相关系数的经典WGCNA法和基于MIC非线性相关系数的改进WGCNA法进行基因共表达网络分析, 共获取胁迫相关的8个Hub基因集; 进一步, 对同一胁迫相关的Hub基因进行整合分析, 得到最终的水稻干旱胁迫相关Hub基因1936个、盐胁迫相关的Hub基因1504个。最后, 从预测性能、富集分析、文献报道、STRING在线互作分析和Cytoscape可视化分析等多角度解析Hub基因的生物学意义。结果显示: Hub基因整体预测性能较优, 且大多富集到了与干旱/盐胁迫相关的通路上, 其中有文献已报道的干旱胁迫响应基因31个和盐胁迫响应基因22个。此外, 通过对Hub基因的互作分析, 预测得到11个干旱胁迫候选基因和5个盐胁迫候选基因。本研究为“高维度、小样本”的农作物基因测序数据的有效分析提供了新思路, 实验结果为抗逆水稻品种研究提供了参考。
关键词: 水稻;多平台;干旱胁迫;盐胁迫;WGCNA-MIC

Abstract
Mining stress-related genes based on multi-platform gene expression data in rice can increase the reliability of key genes prediction and obtain more universally meaningful results. In this study, 94 affymetrix microarray data and 42 RNA-seq transcriptome data related to rice abiotic stress were collected from NCBI databases. First, multiple datasets related to the same stress on the same type were fused by data conversion method to obtain the affymetrix data set D_affy and RNA-seq data set D_rnaseq related to drought stress, and the affymetrix data set S_affy and the RNA-seq data set S_rnaseq related to salt stress. Then, the four datasets were analyzed by the classical WGCNA method based on Pearson's linear correlation coefficient and the improved WGCNA method based on the MIC nonlinear correlation coefficient respectively, and the eight Hub gene sets related to stress were obtained. Further, the integration analysis of stress-related Hub genes yielded the final 1936 drought stress-related Hub genes and 1504 salt stress-related Hub genes. Finally, the biological significance of Hub gene was analyzed from multiple perspectives, including prediction performance, enrichment analysis, literature report, STRING online interaction analysis, and Cytoscape visualization analysis. The results revealed that the overall prediction performance of Hub genes was better, and most of them were enriched in the pathways related to drought/salt stress. Among them, there were 31 drought stress response genes and 22 salt stress response genes reported in the literatures. In addition, 11 drought stress candidate genes and 5 salt stress candidate genes were predicted using the interaction analysis of Hub genes. In conclusion, This study provides a new idea for the effective analysis of “high-dimensional, small-sample” crop gene sequencing data, and the experimental results provide a reference for the study of stress-resistant rice varieties.
Keywords:rice;multi-platform;drought stress;salt stress;WGCNA-MIC


PDF (3476KB)元数据多维度评价相关文章导出EndNote|Ris|Bibtex收藏本文
本文引用格式
刘亚文, 张红燕, 曹丹, 李兰芝. 基于多平台基因表达数据的水稻干旱和盐胁迫相关基因预测. 作物学报, 2021, 47(12): 2423-2439 DOI:10.3724/SP.J.1006.2021.02084
LIU Ya-Wen, ZHANG Hong-Yan, CAO Dan, LI Lan-Zhi. Prediction of drought and salt stress-related genes in rice based on multi-platform gene expression data. Acta Agronomica Sinica, 2021, 47(12): 2423-2439 DOI:10.3724/SP.J.1006.2021.02084


水稻在生长发育的过程中受到干旱、高盐等非生物逆境因素胁迫时, 易导致大面积的减产、品质下降甚至坏死[1], 提高其逆境抗性将增加农业产量并扩大适宜耕种面积, 缓解人口压力。水稻的逆境抗性受多基因控制, 基于基因组学数据挖掘水稻非生物胁迫相关基因, 对培育抗逆水稻新品种具有重大意义。近年来, 随着大规模基因表达水平测量技术的发展, 基于杂交原理的基因芯片技术[2]和基于高通量测序技术的RNA-seq[3], 被众多****用于植物胁迫响应基因的挖掘研究中[4]

然而, 大多实验组测序样本有限, 仅单一地从与水稻某非生物胁迫相关的单个实验组测序数据来挖掘胁迫相关基因, 结果不稳定, 也很难让人信服[5]。当前公共数据库中积累的大量水稻胁迫相关基因芯片和RNA-seq表达数据, 为多平台数据分析提供了研究空间。研究表明, 融合多平台数据能够提高基因表达分析的准确性和可靠性, 多平台表达数据的整合分析成为水稻非生物胁迫相关基因预测研究的趋势[6]。当前, 多平台基因表达数据的融合通常可分为两类: (1) 基于输出层面融合的元分析法。它通过对多个研究结果进行合并汇总, 增大样本总量, 提高检测准确率和统计分析结果的一致性[7]。(2) 基于原始数据融合的数据转换法。它先通过把不同平台基因表达数据按一定规则转换到同一个数据范围内, 再将转换后的多个平台实验数据直接合并成一个表达数据矩阵, 以此来增加样本数目缓解“高维数、小样本”维数灾难问题[8]。综合考虑单个实验组水稻测序的小样本, 及水稻基因芯片数据与RNA-seq数据之间尺度与维度的差异, 本研究首先对同一胁迫相关的多个基因芯片数据或多个RNA-seq数据分别采取数据转换法融合, 再分别基于融合后的基因芯片表达数据集和RNA-seq表达数据集进行胁迫相关基因挖掘, 最后将二者的结果实施元分析, 获取最终的胁迫响应基因。

为了有效利用多平台基因表达数据, 本文选用加权基因共表达网络分析(Weighted Gene Co-expression Network Analysis, WGCNA)法来挖掘关键基因。WGCNA利用基因表达数据构造协同表达的基因模块, 并根据基因模块与表型的关联性以及基因模块的内连性来鉴定关键基因[9], 其基本假定是“表达模式相似的基因功能相似”。它可将表达模式相似的基因进行聚类, 并分析模块与特定性状或表型之间的关联关系, 因此在作物的干旱胁迫、盐胁迫等非生物胁迫相关基因的挖掘研究中被广泛应用。例如李旭凯等[10]利用WGCNA挖掘到2599个与水稻冷胁迫、干旱胁迫和盐胁迫都相关的基因, 并预测出25个抗逆关键基因; Zhu等[11]通过对转录组数据进行WGCNA分析, 确定了水稻盐胁迫响应核心差异基因和模块; Lv等[12]以转录组数据为基础进行WGCNA分析, 预测了各模块重要的Hub差异基因和调控水稻干旱应答基因表达的主要转录因子; Hopper等[13]使用时间序列转录方法结合WGCNA网络分析, 为葡萄耐旱性研究提供了候选基因; 秦天元等[14]使用WGCNA挖掘马铃薯根系抗旱核心基因, 并进一步利用RT-qPCR验证出挖掘到的核心基因确实响应干旱胁迫。经典的WGCNA 以Pearson相关系数度量2个基因表达量间的线性相似性(记为WGCNA-P), 但无法捕获基因间可能广泛存在的非线性关联。Reshef等[15]****基于信息论中的互信息理论提出了一种可度量两变量非线性相关性的普适性测度最大信息系数(Maximal Information Coefficient, MIC), 论文提出以MIC作为相似性度量替代WGCNA中的Pearson相关系数来构建基因共表达网络(记为WGCNA-MIC), 以捕捉基因间的非线性关联。同时, 考虑到特定线性情形下MIC的统计功效[16]不如Pearson相关系数, 所以本研究对同一数据集分别基于WGCNA-P和WGCNA-MIC两种方法来构建基因共表达网络, 并对各自获取的Hub基因集进行整合分析。

综上, 本研究以多平台水稻非生物胁迫(以干旱和盐胁迫为代表)相关的基因芯片数据和RNA-seq数据为研究对象, 分别以WGCNA-P和WGCNA- MIC挖掘胁迫相关Hub基因, 进而对同一胁迫不同平台数据使用以上2种网络分析法得到的Hub基因进行整合分析, 得到最终的胁迫相关Hub基因集。最后, 从预测性能、基因功能富集分析、文献报道和互作网络分析等多角度解析了Hub基因的生物学意义。

1 材料与方法

1.1 数据的获取及预处理

1.1.1 水稻基因芯片数据的获取及预处理 水稻的基因芯片数据来源于NCBI的GEO (gene expression omnibus)数据库(GPL2025平台)。芯片数据的预处理利用R (v3.5.1)软件完成, 其过程如图1所示。首先利用arrayQualityMetrics包对数据进行质量控制; 然后利用affy包的RMA算法(背景处理、归一化处理、汇总)计算芯片表达水平; 随后再利用biomaRt包[17,18]进行探针号注释, 当多个探针注释到同一基因时, 取多探针表达量的平均值作为该基因表达量。分别合并与干旱胁迫相关的4个数据集GSE6901、GSE21651、GSE23211、GSE26280获62个样本, 与盐胁迫相关的3个数据集GSE6901、GSE14403、GSE16108获32个样本(详见附表1)。用limma包的removeBatchEffect函数去除批次效应, 且对低表达基因进行了过滤用于后续分析。

图1

新窗口打开|下载原图ZIP|生成PPT
图1水稻affy数据处理流程

Fig. 1Process of rice affy data processing



Table S1
附表1
附表1来源于NCBI的Affymetrix基因芯片数据数据
Table S1Affymetrix microarray data from NCBI
胁迫
Stress
GEO关联 GEO accession测序平台
Platform
样本数(对照组/胁迫组)
Samples (control/stress)
干旱
Drought
GSE6901Affymetrix Rice Genome Array (GPL2025)N = 6 (3/3 drought)
GSE21651Affymetrix Rice Genome Array (GPL2025)N = 8 (4/4 drought)
GSE23211Affymetrix Rice Genome Array (GPL2025)N = 12 (6/6 drought)
GSE26280Affymetrix Rice Genome Array (GPL2025)N = 36 (18/18 drought)

Salt
GSE6901Affymetrix Rice Genome Array (GPL2025)N = 6 (3/3 salt)
GSE14403Affymetrix Rice Genome Array (GPL2025)N = 18 (9/9 salt)
GSE16108Affymetrix Rice Genome Array (GPL2025)N = 8 (4/4 salt)

新窗口打开|下载CSV

1.1.2 水稻转录组数据的获取及预处理 水稻转录组RNA-seq数据来源于NCBI的SRA (sequence read archive)数据库(Illumina平台), 干旱胁迫相关有SRR7054176-83、SRR3051740-45、SRR3051752- 57共20个runs, 盐胁迫相关有ERR266221-38、SRR3647326-31共24个runs (选用李旭凯等[10]所用数据, 详见附表2)。数据预处理过程如图2所示。首先利用fasterq-dump (v2.10.7)工具将下载的SRA格式数据转换为fastq格式序列文件, 并利用FastQC (v0.11.9)[19]软件对原始测序数据进行质量评估; 接着利用fastp (0.20.1)[20]软件做质量控制, 得到clean data; 然后根据MSU Rice Genome Annotation Project数据库(http://rice.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_7.0/all.dir/)的水稻参考基因组和注释信息, 利用Hisat2 (v2.2.10)软件对clean data进行序列比对; 随后利用Samtools (v0.1.19)软件将SAM文件转换为BAM文件并重新排序后, 用featureCounts (v2.0.1)[21]软件得到每个基因在各个样本中的原始reads计数; 本研究使用R包DESeq2[22]获取RNA-seq数据标准化后的基因表达量用于后续分析。

Table S2
附表2
附表2来源于NCBI的RNA-seq数据(RNA-seq)
Table S2RNA-seq data from NCBI
样品编号
Library name
样品描述
Library description
样品类型
Library type
单/双端
Library layout
水稻品种
Rice genotype
SRR36473311-leaf_34_day_SaltRNAseqSingleNipponbare
SRR36473291-leaf_34_day_SaltRNAseqSingleNipponbare
SRR36473271-leaf_34_day_SaltRNAseqSingleNipponbare
SRR36473301-leaf_34_day_ControlRNAseqSingleNipponbare
SRR36473281-leaf_34_day_ControlRNAseqSingleNipponbare
SRR36473261-leaf_34_day_ControlRNAseqSingleNipponbare
ERR2662281-Seedling_shoots_2_weeks_1h_ControlRNAseqSingleNipponbare
ERR2662331-Seedling_shoots_2_weeks_1h_ControlRNAseqSingleNipponbare
ERR2662301-Seedling_shoots_2_weeks_1h_ControlRNAseqSingleNipponbare
ERR2662251-Seedling_shoots_2_weeks_24h_ControlRNAseqSingleNipponbare
ERR2662341-Seedling_shoots_2_weeks_24h_ControlRNAseqSingleNipponbare
ERR2662321-Seedling_shoots_2_weeks_24h_ControlRNAseqSingleNipponbare
ERR2662291-Seedling_shoots_2_weeks_5h_ControlRNAseqSingleNipponbare
ERR2662231-Seedling_shoots_2_weeks_5h_ControlRNAseqSingleNipponbare
ERR2662221-Seedling_shoots_2_weeks_5h_ControlRNAseqSingleNipponbare
ERR2662371-Seedling_shoots_2_weeks_1h_SaltRNAseqSingleNipponbare
ERR2662361-Seedling_shoots_2_weeks_1h_SaltRNAseqSingleNipponbare
ERR2662351-Seedling_shoots_2_weeks_1h_SaltRNAseqSingleNipponbare
ERR2662381-Seedling_shoots_2_weeks_24h_SaltRNAseqSingleNipponbare
ERR2662261-Seedling_shoots_2_weeks_24h_SaltRNAseqSingleNipponbare
ERR2662311-Seedling_shoots_2_weeks_24h_SaltRNAseqSingleNipponbare
ERR2662271-Seedling_shoots_2_weeks_5h_SaltRNAseqSingleNipponbare
ERR2662241-Seedling_shoots_2_weeks_5h_SaltRNAseqSingleNipponbare
ERR2662211-Seedling_shoots_2_weeks_5h_SaltRNAseqSingleNipponbare
SRR70541832-Inflorescence_ControlRNAseqPairedNipponbare
SRR70541822-Inflorescence_ControlRNAseqPairedNipponbare
SRR70541812-Inflorescence_ControlRNAseqPairedNipponbare
SRR70541802-Inflorescence_ControlRNAseqPairedNipponbare
SRR70541792-Inflorescence_DroughtRNAseqPairedNipponbare
SRR70541782-Inflorescence_DroughtRNAseqPairedNipponbare
SRR70541772-Inflorescence_DroughtRNAseqPairedNipponbare
SRR70541762-Inflorescence_DroughtRNAseqPairedNipponbare
SRR3051752Drought stress_rep1RNAseqPairedNipponbare
SRR3051753Drought stress_rep2RNAseqPairedNipponbare
SRR3051754Drought stress_rep3RNAseqPairedNipponbare
SRR3051755Well-water_rep1RNAseqPairedNipponbare
SRR3051756Well-water_rep2RNAseqPairedNipponbare
SRR3051757Well-water_rep3RNAseqPairedNipponbare
SRR3051740Well-water_rep1RNAseqPairedNipponbare
SRR3051741Well-water_rep2RNAseqPairedNipponbare
SRR3051742Well-water_rep3RNAseqPairedNipponbare
SRR3051743Drought stress_rep1RNAseqPairedNipponbare
SRR3051744Drought stress_rep2RNAseqPairedNipponbare
SRR3051745Drought stress_rep3RNAseqPairedNipponbare
样本ERR266226为离群样本, 且考虑正负样本均衡, 后期处理丢弃该样本及其对照组ERR266225。
The sample ERR266226 is an outlier sample, and considering the balance of positive and negative samples, post-processing discards this sample and its control group ERR266225.

新窗口打开|下载CSV

图2

新窗口打开|下载原图ZIP|生成PPT
图2水稻RNA-seq数据处理流程

Fig. 2Process of rice RNA-seq data processing



经过上述对同一平台同一胁迫相关的多个数据集的数据融合, 共获4个水稻数据集: 干旱胁迫相关的基因芯片数据集D_affy和RNA-seq数据集D_rnaseq, 盐胁迫相关的基因芯片数据集S_affy和RNA-seq数据集S_rnaseq, 数据详见表1

Table 1
表1
表1水稻数据集
Table 1Data set of rice
数据集
Data set
基因数
No. of genes
总样本数
Total samples
对照组样本数
Control samples
胁迫组样本数
Stress samples
干旱芯片数据
Drought stress-related affymetrix dataset (D_affy)
27,344623131
盐芯片数据
Salt stress-related affymetrix dataset (S_affy)
27,344321616
干旱RNA-seq数据
Drought stress-related RNA-seq dataset (D_rnaseq)
29,828201010
盐RNA-seq数据
Salt stress-related RNA-seq dataset (S_rnaseq)
28,425221111

新窗口打开|下载CSV

1.2 基于WGCNA-P和WGCNA-MIC的共表达网络分析

数据经预处理后仍然包含2万多个基因(表1), 考虑到直接进行共表达网络分析计算量过大, 本研究采用前文提及的最大信息系数MIC进行基因初筛。分别计算各数据中基因与表型之间的MIC值, MIC值越高, 意味着该基因与表型相关性越大, 我们选取MIC值较高的前30%基因用于后续的共表达网络分析。

本研究中, 经典的加权基因共表达网络WGCNA-P构建直接利用R语言中的WGCNA包提供的一系列函数实现, 而改进的WGCNA-MIC法则基于WGCNA包中的相关函数自编代码实现(代码见附件)。二者构建的主要步骤如下:

(1)计算相似矩阵${{\left( {{S}_{ij}} \right)}^{\text{unsigned}}}=\left| \text{cor}\left( i,j \right) \right|$基于WGCNA-P方法中相似矩阵中的元素由基因i和基因j之间的Pearson线性相关系数组成; 而基于WGCNA-MIC方法中相似矩阵中的元素由基因i和基因j之间的MIC非线性相关系数组成, 即Sij=MIC(i, j);

(2)定义邻接矩阵${{\text{a}}_{ij}}={{\left| {{S}_{ij}} \right|}^{\beta }}$, 即对相似性进行幂律运算, 且为使得网络中基因间的连接服从无尺度分布, 根据无尺度网络模型指数${{R}^{2}}$选择软阈值$\beta $;

(3)构建拓扑重叠矩阵$\text{TO}{{\text{M}}_{ij}}$;

(4)计算距离矩阵$\text{disTOM}=1-\text{TO}{{\text{M}}_{ij}}$, 构建层次聚类树, 并利用动态剪枝算法获得基因模块, 模块最小基因数设为30。接着对相似模块进行合并, 合并阈值为0.2 (cutHeight=0.2)。

1.3 表型相关基因模块识别

为识别网络中的与表型相关的显著模块, 通常有以下2种方法:

(1)计算基因模块特征基因(module eigengenes, MEs)与表型的相关系数, 设为ME, 其中某一模块的第一主成分被定义为该模块的特征基因。

(2)计算模块的显著性系数(module significance, MS), 模块显著性MS是该模块内所有基因的显著性(Gene Significance, GS)的均值[9], GS为基因与表型性状的相关系数绝对值。某模块的ME和MS值越大, 与表型越相关。本研究中, WGCNA-P方法所有涉及相关系数的计算均采用皮尔逊相关系数, 而WGCNA-MIC方法中则均用最大信息系数MIC。综合考虑ME、MS值, 模块数及所选模块的代表性, 本文对模块数小于10的选择1个显著模块、大于等于10且小于15的选择2个显著模块、大于15的选择4个显著模块。

1.4 Hub基因选择

利用网络中连接度高的枢纽节点来确定基因的优先级, 是一种理解和解释网络和整体生物复杂性的简便方法[23]。Hub基因是依据基因与表型性状之间的相关性GS值、基因与其所在模块特征基因间的相关性MM值来选取。对同一胁迫的2个不同平台数据分别基于WGCNA-P和WGCNA-MIC可获得该胁迫相关的4个Hub基因子数据集, 对其进行元分析, 取并集, 可获得该胁迫相关的Hub基因总集。

1.5 Hub基因的预测性能

支持向量机(support vector machine, SVM)提供了一种高效分两类或两类以上数据的方法[24], 为验证Hub基因选择的合理性, 本研究基于干旱胁迫和盐胁迫的8个Hub基因子集及最终的2个Hub基因总集依次构建SVM模型对表型进行分类预测。通过5次5折交叉验证进行测试, 最终以平均精度作为最后的预测结果。

1.6 Hub基因功能分析

1.6.1 GO富集及文献报道分析 利用AgriGo (http://systemsbiology.cau.edu.cn/agriGOv2/index.php)[25]富集分析工具对Hub基因进行GO富集分析。从国家水稻数据中心(http://www.ricedata.cn/)的ontology系统分别以检索条件“干旱”、“盐”进行检索, 并获取到文献已报道的250个干旱胁迫相关基因和363个盐胁迫相关基因。随后分别分析Hub基因总集中已报道基因情况, 并结合结果进一步挖掘可能的相关基因。

1.6.2 蛋白质互作网络构建与分析 利用STRING和Cytoscape工具构建Hub基因的蛋白互作网络。将Hub基因导入STRING (v11.0)[26]蛋白互作

在线分析工具(https://string-db.org/)构建蛋白质互作网络, 采用默认设置, 获得并导出蛋白互作数据。利用Cytoscape (v3.7.1)[27]工具提取已报道胁迫相关Hub基因及其相关基因的子网络进行可视化分析, 每一个基因由网络中的一个节点表示, 相互连接的2个基因之间存在着某种关系。

2 结果与分析

2.1 WGCNA-P、WGCNA-MIC及显著模块识别

图3所示, 干旱胁迫基因芯片数据D_affy基于WGCNA-P进行基因共表达网络分析时, 动态剪切得到35个基因模块, 合并后得到23个模块; 基于WGCNA-MIC方法分析, 30个模块合并后得到10个模块。纵坐标的不同颜色代表不同的模块, 各模块与干旱胁迫之间的相关性及模块显著性详见图3, 基于WGCNA-P识别的显著模块及模块内基因数分别为brown (833)、red (383)、darkgrey (106)、purple (260) 四个模块共1582个基因, 而基于WGCNA- MIC识别的模块为darkturquoise (1114)和midnightblue (265)两个模块共1379个基因。

图3

新窗口打开|下载原图ZIP|生成PPT
图3D_affy数据WGCNA网络分析结果

A、B: 分别基于WGCNA-P和WGCNA-MIC方法的基因聚类树和模块划分, Dynamic Tree Cut表示由原始计算划分的模块, Merged dynamic表示合并后的结果; C、D: 分别基于WGCNA-P和WGCNA-MIC方法的各模块的ME和MS值, 红色表示模块与干旱胁迫正相关, 蓝色表示模块与干旱胁迫负相关。
Fig. 3WGCNA network analysis of D_affy data

A, B: the gene clustering tree and module division based on WGCNA-P and WGCNA-MIC methods, respectively; the Dynamic Tree Cut represents the module divided by the original calculation, the Merged dynamic represents the merged result; C, D: the ME and MS values of each module based on WGCNA-P and WGCNA-MIC methods, respectively; red block indicates that the module is positively correlated with drought stress, and blue indicates that the module is negatively correlated with drought stress.


图4所示, 干旱胁迫RNA_seq数据D_rnaseq基于WGCNA-P和WGCNA-MIC方法分别得到13个和7个模块, 且前者分别选取了saddlebrown (983)、darkorange (395)两个模块共1378个基因, 而后者则选取magenta (4089)模块以用于后续分析。

图4

新窗口打开|下载原图ZIP|生成PPT
图4D_rnaseq数据WGCNA网络分析结果

A、B: 分别基于WGCNA-P和WGCNA-MIC方法的基因聚类树和模块划分, Dynamic Tree Cut表示由原始计算划分的模块, Merged dynamic表示合并后的结果; C、D: 分别基于WGCNA-P和WGCNA-MIC方法的各模块的ME和MS值, 红色表示模块与干旱胁迫正相关, 蓝色表示模块与干旱胁迫负相关。
Fig. 4WGCNA network analysis of D_rnaseq data

A, B: the gene clustering tree and module division based on WGCNA-P and WGCNA-MIC methods, respectively; the Dynamic Tree Cut represents the module divided by the original calculation, the Merged dynamic represents the merged result; C, D: the ME and MS values of each module based on WGCNA-P and WGCNA-MIC methods, respectively; red block means that the module is positively correlated with drought stress, and blue indicates that the module is negatively correlated with drought stress.


盐胁迫基因芯片数据(附图1), 使用WGCNA-P方法时, 动态剪切得到20个模块, 经合并后得到17个模块; 使用WGCNA-MIC方法时, 43个模块合后得到40个模块。基于WGCNA-P方法最终选取的模块及模块内基因数分别magenta (213)、red (331)、purple (199)、pink (803)四个模块共1546个基因, 而基于WGCNA-MIC方法则分别选取了turquoise (2818)、darkturquoise (74)、red (210)和brown (409)四个模块共3511个基因以用于后续分析。盐胁迫的RNA-seq数据(附图2), 基于WGCNA-P和WGCNA- MIC网络最终分别得到27个和19个模块, 且前者分别选取了brown (1148)、plum1 (73)、darkgreen (183)、magenta (351)四个模块共1755个基因, 而后者则分别选取了pink (308)、lightcyan (42)、darkgreen (130)和darkturquoise (193)四个模块共673个基因以用于后续分析。

附图1

新窗口打开|下载原图ZIP|生成PPT
附图1S_affy数据WGCNA网络分析结果

A、B: 分别基于WGCNA-P和WGCNA-MIC方法的基因聚类树和模块划分, Dynamic Tree Cut表示由原始计算划分的模块, Merged dynamic表示合并后的结果; C、D: 分别基于WGCNA-P和WGCNA-MIC方法的各模块的ME和MS值, 红色表示模块与盐胁迫正相关, 蓝色表示模块与盐胁迫负相关。
Fig. S1WGCNA network analysis of S_affy

A, B: gene clustering tree and module division of based on WGCNA-P and WGCNA-MIC methods respectively; the Dynamic Tree Cut represents the module divided by the original calculation, the merged dynamic represents the merged result; C, D: the ME and MS values of each module based on WGCNA-P and WGCNA-MIC methods respectively; red indicates that the module is positively correlated with salt stress, and blue indicates that the module is negatively correlated with salt stress.


附图2

新窗口打开|下载原图ZIP|生成PPT
附图2S_rnaseq数据WGCNA网络分析结果

A、B: 分别基于WGCNA-P和WGCNA-MIC法的基因聚类树和模块划分, Dynamic Tree Cut表示由原始计算划分的模块, Merged dynamic表示合并后的结果; C、D: 分别基于WGCNA-P和WGCNA-MIC法的各模块的ME和MS值, 红色表示模块与盐胁迫正相关, 蓝色表示模块与盐胁迫负相关。
Fig. S2WGCNA network analysis of S_rnaseq

A, B: gene clustering tree and module division of based on WGCNA-P and WGCNA-MIC methods respectively; the Dynamic Tree Cut represents the module divided by the original calculation, the merged dynamic represents the merged result; C, D: the ME and MS values of each module based on WGCNA-P and WGCNA-MIC methods respectively; red indicates that the module is positively correlated with salt stress, and blue indicates that the module is negatively correlated with salt stress.


2.2 Hub基因的预测性能

本研究中, 干旱胁迫相关Hub基因挑选阈值设为GS>0.4且MM>0.83, 盐胁迫相关Hub基因筛选阈值设为GS>0.3且MM>0.75。4个数据集分别基于2种网络分析方法共获得8个Hub基因子集(D_affy_ P、D_affy_MIC、D_rnaseq_P、D_rnaseq_MIC和S_affy_P、S_affy_MIC、S_rnaseq_P、S_rnaseq_MIC), 对基因子集元分析后得到干旱胁迫相关Hub基因总集D_meta_hub和盐胁迫相关Hub基因总集S_meta_ hub。基于各Hub基因集对表型的SVM分类精度如表2所示, Hub基因的预测性能整体表现优异, 其中, 基于WGCNA-MIC方法获取的Hub基因, 较之基于WGCNA-P方法获取的Hub基因预测精度略高, 元分析后的Hub基因总集D_meta_hub和S_meta_hub, 在各数据集上的平均预测精度比各Hub基因子集的精度略高。结果表明, Hub基因与表型性状相关性强, WGCNA-MIC方法和元分析有效。

Table 2
表2
表2Hub基因的分类精度
Table 2Classification accuracy of Hub genes
胁迫
Stress
Hub基因集
Hub gene set
基因数目
Number of genes
数据集
Data set
平均精度
Average accuracy (%)
干旱DroughtD_affy_P220D_affy100
D_affy_MIC104D_affy100
D_rnaseq_P738D_rnaseq90.0
D_rnaseq_MIC1634D_rnaseq91.0
D_meta_hub1936D_affy100
D_rnaseq96.0
盐SaltS_affy_P470S_affy100
S_affy_MIC293S_affy100
S_rnaseq_P684S_rnaseq81.0
S_rnaseq_MIC331S_rnaseq84.6
S_meta_hub1504S_affy100
S_rnaseq84.6

新窗口打开|下载CSV

2.3 GO功能富集分析

利用AgriGO在线功能富集分析工具, 分别对干旱/盐胁迫相关Hub基因集进行基因功能富集分析, 在生物学过程(biological process, BP)、分子功能(molecular function, MF)和细胞组分(cellular component, CC)三大分类中都显著富集到了多个相关GO通路。具体富集结果如表3所示。干旱胁迫相关富集结果显示, 生物学过程中, 显著富集到的通路, 包括应对刺激的通路: 内源性刺激响应(GO:0009719)、激素刺激响应(GO:0009725)和非生物刺激响应(GO:0009628)等; 参与特殊代谢物代谢过程的通路: 萜类化合物代谢过程(GO:0006721)等; 与干旱胁迫较为直接相关的通路: 对水的响应(GO:0009415)和渗透胁迫响应(GO:0006970)等。分子功能中, 显著富集到了与信号传导相关的通路: 受体活性(GO:0004872)、翻译因子活性与核酸结合(GO:0008135)等; 一些参与调控某些蛋白质酶相关的通路: 蛋白质酪氨酸激酶活性(GO:0004713)等。另外, 还有不少显著富集到细胞组分相关的通路: 薄膜(GO:0016020)等。盐胁迫相关富集结果显示, 生物学过程中, 显著富集的通路, 包括参与各种物质代谢过程的通路: 草酸代谢过程(GO:0043436)和有机酸代谢过程(GO:0006082)等; 响应胁迫相关的功能: 内源性激素的响应(GO:0009719)等; 参与光合作用: 光刺激响应(GO:0009416)等。分子功能中, 最显著富集到的通路是受体活性(GO:0004872)。细胞组分中富集到了很多与膜组分相关参与渗透作用的通路, 如薄膜(GO:0016020)和细胞质膜等(GO:0005886); 参与光合作用的组件: 叶绿体(GO:0009507)等。

Table 3
表3
表3Hub基因的GO富集部分分析结果
Table 3GO enrichment of Hub partial genes
胁迫
Stress
GO条目
GO term
基因数目
Number of genes
基因本体
Ontology
描述
Description
P
P-value
干旱
Drought
GO:001003312BP对有机物的响应
Response to organic substance
9.00E-06
GO:000971912BP内源性刺激响应
Response to endogenous stimulus
9.00E-06
GO:000972512BP激素刺激响应
Response to hormone stimulus
9.00E-06
GO:00067218BP萜类化合物代谢过程
Terpenoid metabolic process
2.80E-05
GO:000716537BP信号传导Signal transduction2.20E-05
GO:000962815BP对非生物刺激的响应
Response to abiotic stimulus
0.00054
GO:00069705BP渗透胁迫响应
Response to osmotic stress
0.0036
GO:00094155BP对水的响应Response to water0.006
GO:000487222MF受体活性 Receptor activity9.40E-13
GO:000471321MF蛋白质酪氨酸激酶活性
Protein tyrosine kinase activity
9.00E-12
GO:000472215MF蛋白丝氨酸/苏氨酸磷酸酶活性
Protein serine/threonine phosphatase activity
2.70E-05
GO:000813512MF翻译因子活性, 核酸结合
Translation factor activity, Nucleic acid binding
0.0099
GO:00444241120CC细胞内成分 Intracellular part0
GO:00057371011CC细胞质Cytoplasm0
GO:0016020278CC薄膜 Membrane3.10E-23

Salt
GO:00708876BP细胞对化学刺激的响应
Cellular response to chemical stimulus
2.00E-06
GO:000727513BP多细胞有机体的发育
Multicellular organismal development
5.30E-08
GO:004343652BP草酸代谢过程
Oxoacid metabolic process
4.80E-06
GO:001975252BP羧酸代谢过程
Carboxylic acid metabolic process
4.80E-06
GO:000608252BP有机酸代谢过程
Organic acid metabolic process
5.10E-06
GO:001003311BP对有机物的响应
Response to organic substance
4.70E-06
GO:000971911BP内源性刺激响应
Response to endogenous stimulus
4.70E-06
GO:00094166BP光刺激响应
Response to light stimulus
0.024
GO:000487214MF受体活性 Receptor activity5.70E-08
GO:000813518MF翻译因子活性, 核酸结合
Translation factor activity, nucleic acid binding
1.00E-06
GO:000374311MF翻译起始因子活性
Translation initiation factor activity
3.00E-05
GO:001687432MF连接酶活性 Ligase activity0.00064
GO:0016020225CC薄膜 Membrane4.60E-21
GO:000950720CC叶绿体 Chloroplast4.30E-15
GO:000588618CC细胞质膜 Plasma membrane1.20E-09
BP: 生物学过程; MF: 分子功能; CC: 细胞组分。
BP: biological process; MF: molecular function; CC: cellular component.

新窗口打开|下载CSV

综上, 基于元分析获取的2种胁迫的Hub基因, 均富集到了内源性刺激响应(GO:0009719)、激素刺激响应(GO:0009725)和非生物刺激响应(GO:0009628)等胁迫响应的相关通路上。

2.4 文献报道情况分析

为了验证研究结果的可靠性, 根据从国家水稻数据中心获取到的已报道干旱和盐胁迫相关基因分析所选Hub基因的文献报道情况。本研究所选Hub基因中有已报道干旱胁迫相关基因31个和盐胁迫相关基因22个, 如表4所示。

Table 4
表4
表4已报道与胁迫相关的Hub基因
Table 4Hub genes related to stress have been reported
胁迫
Stress
基因编号RAP_locus基因符号
Gene symbol
胁迫
Stress
基因编号
RAP_locus
基因符号
Gene symbol
干旱
Drought
Os05g0455500OsP5CS; OsP5CS1; OsALDH18B1干旱
Drought
Os03g0286900OsRCI2-5
Os02g0766700OsbZIP23Os02g0149800OsPP18
Os08g0112700OsMADS26Os03g0267000OsHSP18.0-CI; OsMSR3; OsSHSP1
Os06g0130100OsSIK1Os05g0475400OsAMTR1
Os09g0552300OsRPK1Os08g0408500OsERF48; OsDRAP1
Os01g0867300OsABF1; OsbZIP12Os04g0676700OsMYB6
Os03g0125100DSM2Os12g0597500OsUAH
Os03g0745000OsHsfA2aOs06g0316000Os2H16
Os05g0542500OsLEA3; OsLEA3-1Os06g0612800OsiSAP8
Os02g0671100MAIF1Os04g0572400OsDREB1E
Os06g0211200OsAREB1; OsbZIP46; OsABF2; ABL1Os11g0126900OsNAC10; ONAC122
Os05g0437700EDT1; OsbZIP40Os11g0707600OsGL1-11
Os05g0569300OsbZIP45Os03g0805100SQS
Os08g0196700OsNF-YA7; OsHAP2AOs05g0213500OsPYL/RCAR5; OsPYL5; OsPYL11
Os03g0230300OsSRO1c; BOC1Os06g0598800WSL1
Os04g0541700Oshox22
盐SaltOs03g0348900OsSRFP1; SDEL2盐SaltOs02g0121300OsCYP2; LRT2
Os04g0652400OsSULTR3; 3;lpaOs03g0272300OsSDIR1
Os03g0329900OsPHR1Os07g0129200OsPR1a; OsSCP
Os07g0187700OsPHF1Os05g0437700EDT1; OsbZIP40
Os02g0678200OsSPX-MFS2; OsPSS2Os08g0557000OsPIMT1
Os02g0325600NIGT1Os06g0693700OsSIDP366
Os01g0755700NBIP1Os04g0676700OsMYB6
Os10g0545700OsACR2.1Os01g0612700OsLOL2; OsLOL5
Os01g0869900OsSAPK4; OSPDKOs01g0948400OsP5CR
Os03g0719900OsPTR8; OsNPF8.5Os03g0319300OsCam1-1
Os09g0434500OsBIERF1Os05g0584200OsLEA5

新窗口打开|下载CSV

2.5 Hub基因互作网络构建

利用在线分析工具STRING和Cytoscape软件挖掘Hub基因总集中蛋白互作关系, 重点关注与已报道胁迫相关的Hub基因互作情况。Hub基因总集中, 与2个及2个以上已报道Hub基因有较强关系, 即网络中节点度≥2且STRING中的combined_score≥0.9的Hub基因考虑作为胁迫候选基因可被进一步挖掘。如图5图6, 图中红色节点表示前文得到的已报道胁迫相关Hub基因(不包括STRING库中未匹配到蛋白质的基因和无相关蛋白的基因), 节点越大, 表示与之相关的基因越多, 线条越粗且颜色越暗, 表示基因之间关系越强。最终找到了与已报道Hub基因存在蛋白互作关系的干旱胁迫候选基因11个(图5中橙色节点), 盐胁迫候选基因5个(图6中橙色节点), 详见表5

图5

新窗口打开|下载原图ZIP|生成PPT
图5干旱胁迫相关基因互作网络

Fig. 5Gene interaction network of drought stress



图6

新窗口打开|下载原图ZIP|生成PPT
图6盐胁迫相关基因互作网络

Fig. 6Gene interaction network of salt stress



Table 5
表5
表5候选基因在STRING中的注释
Table 5Candidate genes annotation in STRING
胁迫
Stress
候选基因
Candidate gene
STRING中名称
Name in STRING
注释
Annotation
干旱DroughtOs01g0733200HSF11热应激转录因子C-1b; 与热休克启动子元件(HSE)的DNA特异性结合的转录调控因子。
Heat stress transcription factor C-1b; transcriptional regulator that specifically binds DNA of heat shock promoter elements (HSE).
Os07g0178600HSF5热应激转录因子A-2b; 转录调控因子, 可特异性结合热休克启动子元件(HSE)的DNA; 属于HSF家族, A类亚科。
Heat stress transcription factor A-2b; transcriptional regulator that specifically binds DNA of heat shock promoter elements (HSE); belongs to the HSF family. Class A subfamily.
Os09g0526600HSFB2C热应激转录因子B-2c; 转录调控因子, 可特异性结合热休克启动子元件(HSE)的DNA; 属于HSF家族, B类亚科。
Heat stress transcription factor B-2c; transcriptional regulator that specifically binds DNA of heat shock promoter elements (HSE); belongs to the HSF family. Class B subfamily.
Os01g0583100OS01T0583100-01可能的蛋白质磷酸酶2C 6; 属于PP2C家族。
Probable protein phosphatase 2C 6; belongs to the PP2C family.
Os03g0231700OS03T0231700-02Os03g0231700蛋白; 角鲨烯单加氧酶, 假定表达; cDNA克隆: J033045D18, 完整插入序列。
Os03g0231700 protein; Squalene monooxygenase, putative, expressed; cDNA clone: J033045D18, full insert sequence.
Os03g0376100OS03T0376100-01Os03g0376100蛋白。
Os03g0376100 protein.
胁迫
Stress
候选基因
Candidate gene
STRING中名称
Name in STRING
注释
Annotation
Os04g0107900OS04T0107900-02Os04g0107900蛋白。
Os04g0107900 protein.
Os01g0840100OsJ_04024cDNA克隆: J100050G20, 完整插入序列; 70 kD热激蛋白; Os01g0840100蛋白; 假定的HSP70; 未表征的蛋白质。
cDNA, clone: J100050G20, full insert sequence; 70 kD heat shock protein; Os01g0840100 protein; Putative HSP70; uncharacterized protein.
Os03g0277300OsJ_10337热休克同源70 kD蛋白, 假定表达。
Heat shock cognate 70 kD protein, putative, expressed.
Os05g0460000OsJ_18811Os05g0460000蛋白; 推定的hsp70; cDNA克隆: J090096I11, 完整插入序列; 属于热休克蛋白70家族。
Os05g0460000 protein; Putative hsp70; cDNA, clone: J090096I11, full insert sequence; Belongs to the heat shock protein 70 family.
Os06g0110200OsJ_19858cDNA克隆: 002-135-D09, 完整插入序列; Os06g0110200蛋白; 假定的未表征蛋白OSJNBa0004I20.22; 假定的未表征蛋白P0514G12.46。
cDNA clone: 002-135-D09, full insert sequence; Os06g0110200 protein; putative uncharacterized protein OSJNBa0004I20.22; putative uncharacterized protein P0514G12.46.

Salt
Os06g0727200CATB过氧化氢酶同工酶B; 几乎发生在所有有氧呼吸的生物中, 并保护细胞免受过氧化氢的毒性作用。
Catalase isozyme B; occurs in almost all aerobically respiring organisms and serves to protect cells from the toxic effects of hydrogen peroxide.
Os12g0502300CYCA2-1Cyclin-A2-1; 属于细胞周期蛋白家族。Cyclin AB亚家族。
Cyclin-A2-1; belongs to the cyclin family. Cyclin AB subfamily.
Os03g0821100OsJ_13143热休克同源70 kD蛋白2, 假定表达; 热休克蛋白同源物70; Os03g0821100蛋白; cDNA克隆: J023030D03, 完整插入序列。
Heat shock cognate 70 kD protein 2, putative, expressed; heat shock protein cognate 70; Os03g0821100 protein; cDNA clone:J023030D03, full insert sequence.
Os10g0491801OsJ_31995假定泛素/核糖体蛋白S27a融合蛋白;泛素融合蛋白, 假定表达。
Putative ubiquitin/ribosomal protein S27a fusion protein; ubiquitin fusion protein, putative, expressed.
Os02g0775200RFC3复制因子C亚基3; 可能参与DNA复制, 从而调节细胞增殖。
Replication factor C subunit 3; may be involved in DNA replication and thus regulate cell proliferation.

新窗口打开|下载CSV

3 讨论

植物为应对干旱胁迫环境, 在生化、细胞和分子等水平上进化出了很多机制[28], 需要改变基因表达来激活促进耐旱性的代谢过程, 这包括特殊代谢物的合成与积累, 并涉及到物种和基因型特异性的酚类化合物、类黄酮、萜类化合物和含氮化合物的产生[29]。盐胁迫威胁作物生长主要体现在渗透和氧化2个方面, 这不仅会导致叶片脱落、根芽坏死等不良症状的发生, 而且潜在地延迟了光合作用、植物激素功能、代谢途径和基因/蛋白质功能等生理活动[30]

本研究以水稻干旱和盐胁迫相关的Affymetrix基因芯片和RNA-seq两种不同平台的数据为研究对象, 基于WGCNA-P和WGCNA-MIC对其进行了胁迫相关Hub基因的挖掘。从Hub基因预测性能来看, 各Hub基因集的预测精度均达80%以上, 预测性能整体较好。从GO富集分析和文献报道来看, 一方面2种胁迫的Hub基因集都富集到了内源性刺激响应(GO:0009719)、激素刺激响应(GO:0009725)和非生物刺激响应(GO:0009628)等干旱和盐响应相关通路; 另一方面也找到了一些已报道的干旱/盐胁迫相关Hub基因, EDT1 (Os05g0437700)和OsMYB6 (Os04g0 676700)既是已报道干旱胁迫相关Hub基因, 又是已报道盐胁迫相关Hub基因。通过总结部分已报道基因的文献发现, 某个胁迫可能由多个基因协同互作参与调控, 某个基因也可能参与了多个非生物胁迫。比如, 31个已报道干旱胁迫响应基因中, 包括OsP5CS (Os05g0455500)[31]的表达受高盐、干旱、冷胁迫和ABA处理的诱导; OsMADS26 (Os08g0112 700)[32]是水稻响应多种胁迫的调控中心; OsbZIP23 (Os02g0766700)[33]增强了水稻的抗旱耐盐性和对ABA的敏感性; OsSIK1 (Os06g0130100)[34]在水稻耐盐和耐旱过程中起重要作用; OsRPK1 (Os09g0552 300)[35]在盐胁迫下表达水平增加, 其表达也受寒冷、干旱与脱落酸等因素的诱导。22个已报道盐胁迫响应基因中, 包括OsSRFP1 (Os03g0348 900)[36]负向调控水稻的耐盐性和耐低温性; 干旱和盐处理诱导OsPTR8 (Os03g0719900)[37]的表达上调; OsSCP (Os07g0129200)[38]在非生物胁迫应答中通过调控胁迫应答基因而发挥作用; 冷胁迫和盐胁迫会导致OsPIMT1 (Os08g0557000)[39]表达量增加2倍; OsLEA5 (Os05g0584200)[40]与多种非生物胁迫抗性相关等。同时, 与李旭凯等[10]、Zhu等[11]和Lv等[12]的研究结果相比, 我们挖掘到的Hub基因中包含了部分已被广泛报道的水稻干旱胁迫和盐胁迫响应相关的转录因子, 如干旱胁迫相关的bZIP转录因子家族(Os02g0766700、Os06g0211200、Os05g0569300)、MYB转录因子家族(Os04g0676700)、NAC转录因子家族(Os11g0126900)和HSF转录因子家族(Os03g0745000)等; 盐胁迫相关的bZIP转录因子家族(Os05g0437700)和MYB转录因子家族(Os04g06 76700)等。最后, 通过分析Hub基因总集中已报道与胁迫相关的Hub基因及其相关基因之间的互作网络, 进一步挖掘到了与干旱或盐胁迫相关较为紧密的候选基因。

综上, 利用元分析的思路对水稻多平台基因表达数据进行整合分析, 可挖掘到水稻干旱和盐胁迫的关键基因, 对农作物非生物胁迫响应的基因挖掘具有一定的参考价值。STRING分析时, 参数阈值的设置不同, 所获候选基因的数量也会有所不同, 本研究中利用combined_score≥0.9获得的候选基因, 可根据实际情况适当调整阈值, 并有待进一步利用实时荧光定量PCR (RT-qPCR)验证。

4 结论

对多平台数据, 通过加权基因共表达网络分析、元分析和蛋白互作网络分析, 最终获得水稻干旱胁迫和盐胁迫相关的Hub基因分别为1936个和1504个, 其中文献已报道的干旱胁迫和盐胁迫相关Hub基因分别是31个和22个, 预测得到的干旱胁迫和盐胁迫候选基因分别是11个和5个。水稻其他非生物胁迫(如冷胁迫、高温胁迫等)多平台数据数据结构及其实验原理与干旱和盐胁迫类似, 故此方法可推广至其他非生物胁迫相关基因挖掘。本研究为充分利用多平台数据挖掘水稻非生物胁迫相关基因提供了新的思路, 也为进一步研究抗逆性水稻品种提供了参考。

参考文献 原文顺序
文献年度倒序
文中引用次数倒序
被引期刊影响因子

Gong Z Z, Xiong L M, Shi H Z, Yang S H, Herrera-Estrella L R, Xu G H, Chao D Y, Li J R, Wang P Y, Qin F, Li J G, Ding Y L, Shi Y T, Wang Y, Yang Y Q, Guo Y, Zhu J K. Plant abiotic stress response and nutrient use efficiency
Sci China (Life Sci Edn), 2020, 63:635-674.

[本文引用: 1]

Hossain M R, Bassel G W, Pritchard J, Sharma G P, Ford-Lloyd B V. Trait specific expression profiling of salt stress responsive genes in diverse rice genotypes as determined by modified significance analysis of microarrays
Front Plant Sci, 2016, 7:567.

[本文引用: 1]

Zhou Y, Yang P, Cui F L, Zhang F T, Luo X D, Xie J K. Transcriptome analysis of salt stress responsiveness in the seedlings of Dongxiang wild rice (Oryza rufipogon Griff.)
PLoS One, 2016, 11(1):e0146242.

DOIURL [本文引用: 1]

Song T, Das D, Yang F, Chen M X, Tian Y, Cheng C L, Sun C, Xu W F, Zhang J H. Genome-wide transcriptome analysis of roots in two rice varieties in response to alternate wetting and drying irrigation
Crop J, 2020, 8:586-601.

DOIURL [本文引用: 1]

Kaur S, Iquebal M A, Jaiswal S, Tandon G, Sundaram R M, Gautam R K, Suresh K P, Rai A, Kumar D. A meta-analysis of potential candidate genes associated with salinity stress tolerance in rice
Agric Gene, 2016, 1:126-134.

[本文引用: 1]

Serin E A R, Nijveen H, Hilhorst H W M, Ligterink W. Learning from co-expression networks: possibilities and challenges
Front Plant Sci, 2016, 7:444.

[本文引用: 1]

Zhou G Y, Soufan O, Ewald J, Hancock R E W, Basu N, Xia J G. NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis
Nucleic Acids Res, 2019, 47(W1):W234-W241.

DOIURL [本文引用: 1]

王凯莉, 张礼, 刘学军. 融合多平台表达数据的转录组差异表达分析
计算机学报, 2018, 41:1195-1210.

[本文引用: 1]

Wang K L, Zhang L, Liu X J. Differential expression analysis based on integrating transcriptome expression data from multiple platforms
Chin J Comp, 2018, 41:1195-1210 (in Chinese with English abstract).

[本文引用: 1]

Cheng Y F, Li L, Qin Z S, Li X, Qi F. Identification of castration-resistant prostate cancer-related hub genes using weighted gene co-expression network analysis
J Cell Mol Med, 2020, 24:1-12.

DOIURL [本文引用: 2]

李旭凯, 李任建, 张宝俊. 利用WGCNA鉴定非生物胁迫相关基因共表达网络
作物学报, 2019, 45:1349-1364.

[本文引用: 3]

Li X K, Li R J, Zhang B J. Identification of rice stress-related gene co-expression modules by WGCNA
Acta Agron Sin, 2019, 45:1349-1364 (in Chinese with English abstract).

[本文引用: 3]

Zhu M D, Xie H J, Wei X J, Dossa K, Yu Y Y, Hui S J, Tang G H, Zeng X S, Yu Y H, Hu P S, Wang J L. WGCNA analysis of salt-responsive core transcriptome identifies novel hub genes in rice
Genes, 2019, 10:719.

DOIURL [本文引用: 2]

Lv Y M, Xu L, Dossa K, Zhou K, Zhu M D, Xie H J, Tang S J, Yu Y Y, Guo X Y, Zhou B. Identification of putative drought-responsive genes in rice using gene co-expression analysis
Bioinformation, 2019, 15:480-488.

DOIURL [本文引用: 2]

Hopper D W, Ghan R, Schlauch K A, Cramer G R. Transcriptomic network analyses of leaf dehydration responses identify highly connected ABA and ethylene signaling hubs in three grapevine species differing in drought tolerance
BMC Plant Biol, 2016, 16:118.

DOIURL [本文引用: 1]

秦天元, 孙超, 毕真真, 梁文君, 李鹏程, 张俊莲, 白江平. 基于WGCNA的马铃薯根系抗旱相关共表达模块鉴定和核心基因发掘
作物学报, 2020, 46:1033-1051.

[本文引用: 1]

Qin T Y, Sun C, Bi Z Z, Liang W J, Li P C, Zhang J L, Bai J P. Identification of drought-related co-expression modules and hub genes in potato roots based on WGCNA
Acta Agron Sin, 2020, 46:1033-1051 (in Chinese with English abstract).

[本文引用: 1]

Reshef D N, Reshef Y A, Finucane H K, Grossman S R, McVean G, Turnbaugh P J, Lander E S, Mitzenmacher M, Sabeti P C. Detecting novel associations in large data sets
Science, 2011, 334:1518-1524.

DOIPMID [本文引用: 1]
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

Britt C L, Weisburd D. Statistical Power. Handbook of Quantitative Criminology
New York: Springer, 2010. pp 313-32.

[本文引用: 1]

Durinck S, Spellman P T, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt
Nat Protoc, 2009, 4:1184-1191.

DOIURL [本文引用: 1]

Steffen D, Yves M, Arek K, Sean D, Bart D M, Alvis B, Wolfgang H. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis
Bioinformatics, 2005, 21:3439-3440.

PMID [本文引用: 1]
biomaRt is a new Bioconductor package that integrates BioMart data resources with data analysis software in Bioconductor. It can annotate a wide range of gene or gene product identifiers (e.g. Entrez-Gene and Affymetrix probe identifiers) with information such as gene symbol, chromosomal coordinates, Gene Ontology and OMIM annotation. Furthermore biomaRt enables retrieval of genomic sequences and single nucleotide polymorphism information, which can be used in data analysis. Fast and up-to-date data retrieval is possible as the package executes direct SQL queries to the BioMart databases (e.g. Ensembl). The biomaRt package provides a tight integration of large, public or locally installed BioMart databases with data analysis in Bioconductor creating a powerful environment for biological data mining.

Kroll K W, Mokaram N E, Pelletier A R, Frankhouser D E, Westphal M S, Stump P A, Stump C L, Bundschuh R, Blachly J S, Yan P. Quality control for RNA-Seq (QuaCRS): an integrated quality control pipeline
Cancer Inform, 2014, 13(S3):7-14.

[本文引用: 1]

Chen S F, Zhou Y Q, Chen Y R, Gu J. Fastp: an ultra-fast all-in-one FASTQ preprocessor
Bioinformatics, 2018, 34:i884-i890.

DOIURL [本文引用: 1]

Liao Y, Smyth G K, Shi W. FeatureCounts: an efficient general purpose program for assigning sequence reads to genomic features
Bioinformatics, 2014, 30:923-930.

DOIURL [本文引用: 1]

Love M I, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
Genom Biol, 2014, 15:31-46.

[本文引用: 1]

Smita S, Katiyar A, Pandey D M, Chinnusamy V, Archak S, Bansal K C. Identification of conserved drought stress responsive gene-network across tissues and developmental stages in rice
Bioinformation, 2013, 9:72-78.

DOIURL [本文引用: 1]

Shaik R, Ramakrishna W. Machine learning approaches distinguish multiple stress conditions using stress-responsive genes and identify candidate genes for broad resistance in rice
Plant Physiol, 2014, 164:481-495.

DOIPMID [本文引用: 1]
Abiotic and biotic stress responses are traditionally thought to be regulated by discrete signaling mechanisms. Recent experimental evidence revealed a more complex picture where these mechanisms are highly entangled and can have synergistic and antagonistic effects on each other. In this study, we identified shared stress-responsive genes between abiotic and biotic stresses in rice (Oryza sativa) by performing meta-analyses of microarray studies. About 70% of the 1,377 common differentially expressed genes showed conserved expression status, and the majority of the rest were down-regulated in abiotic stresses and up-regulated in biotic stresses. Using dimension reduction techniques, principal component analysis, and partial least squares discriminant analysis, we were able to segregate abiotic and biotic stresses into separate entities. The supervised machine learning model, recursive-support vector machine, could classify abiotic and biotic stresses with 100% accuracy using a subset of differentially expressed genes. Furthermore, using a random forests decision tree model, eight out of 10 stress conditions were classified with high accuracy. Comparison of genes contributing most to the accurate classification by partial least squares discriminant analysis, recursive-support vector machine, and random forests revealed 196 common genes with a dynamic range of expression levels in multiple stresses. Functional enrichment and coexpression network analysis revealed the different roles of transcription factors and genes responding to phytohormones or modulating hormone levels in the regulation of stress responses. We envisage the top-ranked genes identified in this study, which highly discriminate abiotic and biotic stresses, as key components to further our understanding of the inherently complex nature of multiple stress responses in plants.

Tian T, Liu Y, Yan H Y, You Q, Yi X, Du Z, Xu W Y, Su Z. AgriGO v2.0: a GO analysis toolkit for the agricultural community
Nucleic Acids Res, 2017, 45(W1):W122-W129.

DOIURL [本文引用: 1]

Szklarczyk D, Gable A L, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva N T, Morris J H, Bork P, Jensen L J, Mering C V. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
Nucleic Acids Res, 2019, 47(D1):D607-D613.

DOI [本文引用: 1]
Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.

Shannon P, Markiel A, Ozier O, Baliga N S, Wang J T, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks
Genom Res, 2003, 13:2498-2504.

DOIURL [本文引用: 1]

王胜昌, 涂海甫, 胡丹, 吴奈, 岑祥, 熊立仲. 水稻抗非生物逆境功能基因的发掘
生命科学, 2016, 28:1216-1229.

[本文引用: 1]

Wang S C, Tu H F, Hu D, Wu N, Cen X, Xiong L Z. The exploitation of rice functional genes for abiotic stress
Chin Bull Life Sci, 2016, 28:1216-1229 (in Chinese with English abstract).

[本文引用: 1]

Zahedi S M, Karimi M, Venditti A. Plants adapted to arid areas: specialized metabolites
Nat Prod Res, 2019: 1-18.

[本文引用: 1]

Murad M A, Khan A L, Muneer S. Silicon in horticultural crops: cross-talk, signaling, and tolerance mechanism under salinity stress
Plants, 2020, 9:460.

DOIURL [本文引用: 1]

Sripinyowanich S, Klomsakul P, Boonburapong B, Bangyeekhun T, Asami T, Gu Y H, Buaboocha T, Chadchawan S. Exogenous ABA induces salt tolerance in indica rice (Oryza sativa L.): the role of OsP5CS1 and OsP5CR gene expression during salt stress
Environ Exp Bot, 2013, 86:94-105.

DOIURL [本文引用: 1]

Ngan K G, Pati P K, Richaud F, Parizot B, Bidzinski P, Mai C D, Bès M, Bourrié I, Meynard D, Beeckman T, Selvaraj M G, Manabu I, Genga A M, Brugidou C, Do V N, Guiderdoni E, Morel J B, Gantet P. OsMADS26 negatively regulates resistance to pathogens and drought tolerance in rice
Plant Physiol, 2015, 169:2935-2949.

[本文引用: 1]

Zong W, Tang N, Yang J, Peng L, Ma S Q, Xu Y, Li G L, Xiong L Z. Feedback regulation of ABA signaling and biosynthesis by a bZIP transcription factor targets drought-resistance-related genes
Plant Physiol, 2016, 171:2810-2825.

DOIPMID [本文引用: 1]
The OsbZIP23 transcription factor has been characterized for its essential role in drought resistance in rice (Oryza sativa), but the mechanism is unknown. In this study, we first investigated the transcriptional activation of OsbZIP23. A homolog of SnRK2 protein kinase (SAPK2) was found to interact with and phosphorylate OsbZIP23 for its transcriptional activation. SAPK2 also interacted with OsPP2C49, an ABI1 homolog, which deactivated the SAPK2 to inhibit the transcriptional activation activity of OsbZIP23. Next, we performed genome-wide identification of OsbZIP23 targets by immunoprecipitation sequencing and RNA sequencing analyses in the OsbZIP23-overexpression, osbzip23 mutant, and wild-type rice under normal and drought stress conditions. OsbZIP23 directly regulates a large number of reported genes that function in stress response, hormone signaling, and developmental processes. Among these targets, we found that OsbZIP23 could positively regulate OsPP2C49, and overexpression of OsPP2C49 in rice resulted in significantly decreased sensitivity of the abscisic acid (ABA) response and rapid dehydration. Moreover, OsNCED4 (9-cis-epoxycarotenoid dioxygenase4), a key gene in ABA biosynthesis, was also positively regulated by OsbZIP23. Together, our results suggest that OsbZIP23 acts as a central regulator in ABA signaling and biosynthesis, and drought resistance in rice.© 2016 American Society of Plant Biologists. All Rights Reserved.

Ouyang S Q, Liu Y F, Liu P, Lei G, He S J, Ma B, Zhang W K, Zhang J S, Chen S Y. Receptor-like kinase OsSIK1 improves drought and salt stress tolerance in rice (Oryza sativa) plants
Plant J, 2010, 62:316-329.

DOIURL [本文引用: 1]

Cheng Y W, Qi Y C, Zhu Q, Chen X, Wang N, Zhao X, Chen H Y, Cui X J, Xu L L, Zhang W. New changes in the plasma-membrane-associated proteome of rice roots under salt stress
Proteomics, 2009, 9:3100-3114.

DOIURL [本文引用: 1]

Fang H M, Meng Q L, Xu J W, Tang H J, Tang S Y, Zhang H S, Huang J. Knock-down of stress inducible OsSRFP1 encoding an E3 ubiquitin ligase with transcriptional activation activity confers abiotic stress tolerance through enhancing antioxidant protection in rice
Plant Mol Biol, 2015, 87:441-458.

DOIURL [本文引用: 1]

Ouyang J, Cai Z Y, Xia K F, Wang Y Q, Duan J, Zhang M Y. Identification and analysis of eight peptide transporter homologs in rice
Plant Sci, 2010, 179:374-382.

DOIURL [本文引用: 1]

Kothari K S, Dansana P K, Jitender G, Tyagi A K. Rice stress associated protein 1 (OsSAP1) interacts with aminotransferase (OsAMTR1) and pathogenesis-related 1a protein (OsSCP) and regulates abiotic stress responses
Front Plant Sci, 2016, 7:1057.

[本文引用: 1]

Wei Y D, Xu H B, Diao L R, Zhu Y H, Xie H G, Cai Q H, Wu F X, Wang Z H, Zhang J F, Xie H A. Protein repair L-isoaspartyl methyl transferase 1 ( PIMT1) in rice improves seed longevity by preserving embryo vigor and viability
Plant Mol Biol, 2015, 89:475-492.

DOIURL [本文引用: 1]

He S, Tan L L, Hu Z L, Chen G P, Wang G X, Hu T Z. Molecular characterization and functional analysis by heterologous expression in E. coli under diverse abiotic stresses for OsLEA5, the atypical hydrophobic LEA protein from Oryza sativa L
Mol Genet Genom, 2012, 287:39-54.

DOIURL [本文引用: 1]

相关话题/基因 数据 网络 生物 过程