基于多平台基因表达数据的水稻干旱和盐胁迫相关基因预测

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

本站小编 Free考研考试/2021-12-26

刘亚文^,¹, 张红燕^,¹^,²^,^*, 曹丹², 李兰芝²

¹湖南农业大学信息与智能科学技术学院, 湖南长沙 410128

²湖南农业大学 / 湖南省农业大数据分析与决策工程技术研究中心, 湖南长沙 410128

Prediction of drought and salt stress-related genes in rice based on multi-platform gene expression data

LIU Ya-Wen^,¹, ZHANG Hong-Yan^,¹^,²^,^*, CAO Dan², LI Lan-Zhi²

¹College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, Hunan, China

²Hunan Engineering and Technology Research Centre for Agricultural Big Data Analysis and Decision-making, Hunan Agricultural University, Changsha 410128, Hunan, China

通讯作者: * 张红燕, E-mail:hongyan_zhang@hunau.edu.cn

收稿日期:2020-11-30接受日期:2021-04-26网络出版日期:2021-06-02

基金资助:

湖南省教育厅科学研究重点项目(18A105)
长沙市工业科技特派员项目(201845)
湖南农业大学“双一流”建设项目(SYL2019075)

Corresponding authors: * E-mail:hongyan_zhang@hunau.edu.cn
Received:2020-11-30Accepted:2021-04-26Published online:2021-06-02

Fund supported:

Key Scientific Research Project of Hunan Education Department(18A105)
Special Commissioner Project of Changsha City for Industrial Science and Technology(201845)
“Double First-class” Construction Project of Hunan Agricultural University(SYL2019075)

作者简介 About authors
E-mail:lyw20201022@163.com

摘要
基于多平台基因表达数据挖掘水稻胁迫相关基因, 可增加关键基因预测的可靠性, 获得更具普适意义的结果。本研究从NCBI数据库中收集了与水稻非生物胁迫相关的94份affymetrix基因芯片数据和42份RNA-seq转录组数据。首先对同一类型同一胁迫相关的多个数据集以数据转换法融合, 得到干旱胁迫相关的affymetrix数据集D_affy和RNA-seq数据集D_rnaseq, 盐胁迫相关的affymetrix数据集S_affy和RNA-seq数据集S_rnaseq; 接着对4个数据集分别基于Pearson线性相关系数的经典WGCNA法和基于MIC非线性相关系数的改进WGCNA法进行基因共表达网络分析, 共获取胁迫相关的8个Hub基因集; 进一步, 对同一胁迫相关的Hub基因进行整合分析, 得到最终的水稻干旱胁迫相关Hub基因1936个、盐胁迫相关的Hub基因1504个。最后, 从预测性能、富集分析、文献报道、STRING在线互作分析和Cytoscape可视化分析等多角度解析Hub基因的生物学意义。结果显示: Hub基因整体预测性能较优, 且大多富集到了与干旱/盐胁迫相关的通路上, 其中有文献已报道的干旱胁迫响应基因31个和盐胁迫响应基因22个。此外, 通过对Hub基因的互作分析, 预测得到11个干旱胁迫候选基因和5个盐胁迫候选基因。本研究为“高维度、小样本”的农作物基因测序数据的有效分析提供了新思路, 实验结果为抗逆水稻品种研究提供了参考。
关键词： 水稻;多平台;干旱胁迫;盐胁迫;WGCNA-MIC

Abstract
Mining stress-related genes based on multi-platform gene expression data in rice can increase the reliability of key genes prediction and obtain more universally meaningful results. In this study, 94 affymetrix microarray data and 42 RNA-seq transcriptome data related to rice abiotic stress were collected from NCBI databases. First, multiple datasets related to the same stress on the same type were fused by data conversion method to obtain the affymetrix data set D_affy and RNA-seq data set D_rnaseq related to drought stress, and the affymetrix data set S_affy and the RNA-seq data set S_rnaseq related to salt stress. Then, the four datasets were analyzed by the classical WGCNA method based on Pearson's linear correlation coefficient and the improved WGCNA method based on the MIC nonlinear correlation coefficient respectively, and the eight Hub gene sets related to stress were obtained. Further, the integration analysis of stress-related Hub genes yielded the final 1936 drought stress-related Hub genes and 1504 salt stress-related Hub genes. Finally, the biological significance of Hub gene was analyzed from multiple perspectives, including prediction performance, enrichment analysis, literature report, STRING online interaction analysis, and Cytoscape visualization analysis. The results revealed that the overall prediction performance of Hub genes was better, and most of them were enriched in the pathways related to drought/salt stress. Among them, there were 31 drought stress response genes and 22 salt stress response genes reported in the literatures. In addition, 11 drought stress candidate genes and 5 salt stress candidate genes were predicted using the interaction analysis of Hub genes. In conclusion, This study provides a new idea for the effective analysis of “high-dimensional, small-sample” crop gene sequencing data, and the experimental results provide a reference for the study of stress-resistant rice varieties.
Keywords：rice;multi-platform;drought stress;salt stress;WGCNA-MIC

PDF (3476KB)元数据多维度评价相关文章导出EndNote|Ris|Bibtex 收藏本文
本文引用格式
刘亚文, 张红燕, 曹丹, 李兰芝. 基于多平台基因表达数据的水稻干旱和盐胁迫相关基因预测. 作物学报, 2021, 47(12): 2423-2439 DOI:10.3724/SP.J.1006.2021.02084
LIU Ya-Wen, ZHANG Hong-Yan, CAO Dan, LI Lan-Zhi. Prediction of drought and salt stress-related genes in rice based on multi-platform gene expression data. Acta Agronomica Sinica, 2021, 47(12): 2423-2439 DOI:10.3724/SP.J.1006.2021.02084

水稻在生长发育的过程中受到干旱、高盐等非生物逆境因素胁迫时, 易导致大面积的减产、品质下降甚至坏死^[1], 提高其逆境抗性将增加农业产量并扩大适宜耕种面积, 缓解人口压力。水稻的逆境抗性受多基因控制, 基于基因组学数据挖掘水稻非生物胁迫相关基因, 对培育抗逆水稻新品种具有重大意义。近年来, 随着大规模基因表达水平测量技术的发展, 基于杂交原理的基因芯片技术^[2]和基于高通量测序技术的RNA-seq^[3], 被众多****用于植物胁迫响应基因的挖掘研究中^[4]。

然而, 大多实验组测序样本有限, 仅单一地从与水稻某非生物胁迫相关的单个实验组测序数据来挖掘胁迫相关基因, 结果不稳定, 也很难让人信服^[5]。当前公共数据库中积累的大量水稻胁迫相关基因芯片和RNA-seq表达数据, 为多平台数据分析提供了研究空间。研究表明, 融合多平台数据能够提高基因表达分析的准确性和可靠性, 多平台表达数据的整合分析成为水稻非生物胁迫相关基因预测研究的趋势^[6]。当前, 多平台基因表达数据的融合通常可分为两类: (1) 基于输出层面融合的元分析法。它通过对多个研究结果进行合并汇总, 增大样本总量, 提高检测准确率和统计分析结果的一致性^[7]。(2) 基于原始数据融合的数据转换法。它先通过把不同平台基因表达数据按一定规则转换到同一个数据范围内, 再将转换后的多个平台实验数据直接合并成一个表达数据矩阵, 以此来增加样本数目缓解“高维数、小样本”维数灾难问题^[8]。综合考虑单个实验组水稻测序的小样本, 及水稻基因芯片数据与RNA-seq数据之间尺度与维度的差异, 本研究首先对同一胁迫相关的多个基因芯片数据或多个RNA-seq数据分别采取数据转换法融合, 再分别基于融合后的基因芯片表达数据集和RNA-seq表达数据集进行胁迫相关基因挖掘, 最后将二者的结果实施元分析, 获取最终的胁迫响应基因。

为了有效利用多平台基因表达数据, 本文选用加权基因共表达网络分析(Weighted Gene Co-expression Network Analysis, WGCNA)法来挖掘关键基因。WGCNA利用基因表达数据构造协同表达的基因模块, 并根据基因模块与表型的关联性以及基因模块的内连性来鉴定关键基因^[9], 其基本假定是“表达模式相似的基因功能相似”。它可将表达模式相似的基因进行聚类, 并分析模块与特定性状或表型之间的关联关系, 因此在作物的干旱胁迫、盐胁迫等非生物胁迫相关基因的挖掘研究中被广泛应用。例如李旭凯等^[10]利用WGCNA挖掘到2599个与水稻冷胁迫、干旱胁迫和盐胁迫都相关的基因, 并预测出25个抗逆关键基因; Zhu等^[11]通过对转录组数据进行WGCNA分析, 确定了水稻盐胁迫响应核心差异基因和模块; Lv等^[12]以转录组数据为基础进行WGCNA分析, 预测了各模块重要的Hub差异基因和调控水稻干旱应答基因表达的主要转录因子; Hopper等^[13]使用时间序列转录方法结合WGCNA网络分析, 为葡萄耐旱性研究提供了候选基因; 秦天元等^[14]使用WGCNA挖掘马铃薯根系抗旱核心基因, 并进一步利用RT-qPCR验证出挖掘到的核心基因确实响应干旱胁迫。经典的WGCNA 以Pearson相关系数度量2个基因表达量间的线性相似性(记为WGCNA-P), 但无法捕获基因间可能广泛存在的非线性关联。Reshef等^[15]****基于信息论中的互信息理论提出了一种可度量两变量非线性相关性的普适性测度最大信息系数(Maximal Information Coefficient, MIC), 论文提出以MIC作为相似性度量替代WGCNA中的Pearson相关系数来构建基因共表达网络(记为WGCNA-MIC), 以捕捉基因间的非线性关联。同时, 考虑到特定线性情形下MIC的统计功效^[16]不如Pearson相关系数, 所以本研究对同一数据集分别基于WGCNA-P和WGCNA-MIC两种方法来构建基因共表达网络, 并对各自获取的Hub基因集进行整合分析。

综上, 本研究以多平台水稻非生物胁迫(以干旱和盐胁迫为代表)相关的基因芯片数据和RNA-seq数据为研究对象, 分别以WGCNA-P和WGCNA- MIC挖掘胁迫相关Hub基因, 进而对同一胁迫不同平台数据使用以上2种网络分析法得到的Hub基因进行整合分析, 得到最终的胁迫相关Hub基因集。最后, 从预测性能、基因功能富集分析、文献报道和互作网络分析等多角度解析了Hub基因的生物学意义。

1 材料与方法

1.1 数据的获取及预处理

1.1.1 水稻基因芯片数据的获取及预处理水稻的基因芯片数据来源于NCBI的GEO (gene expression omnibus)数据库(GPL2025平台)。芯片数据的预处理利用R (v3.5.1)软件完成, 其过程如图1所示。首先利用arrayQualityMetrics包对数据进行质量控制; 然后利用affy包的RMA算法(背景处理、归一化处理、汇总)计算芯片表达水平; 随后再利用biomaRt包^[17,18]进行探针号注释, 当多个探针注释到同一基因时, 取多探针表达量的平均值作为该基因表达量。分别合并与干旱胁迫相关的4个数据集GSE6901、GSE21651、GSE23211、GSE26280获62个样本, 与盐胁迫相关的3个数据集GSE6901、GSE14403、GSE16108获32个样本(详见附表1)。用limma包的removeBatchEffect函数去除批次效应, 且对低表达基因进行了过滤用于后续分析。

图1

新窗口打开|下载原图ZIP|生成PPT
图1水稻affy数据处理流程

Fig. 1Process of rice affy data processing

Table S1
附表1
附表1来源于NCBI的Affymetrix基因芯片数据数据
Table S1Affymetrix microarray data from NCBI

胁迫 Stress	GEO关联 GEO accession	测序平台 Platform	样本数(对照组/胁迫组) Samples (control/stress)
干旱 Drought	GSE6901	Affymetrix Rice Genome Array (GPL2025)	N = 6 (3/3 drought)
干旱 Drought	GSE21651	Affymetrix Rice Genome Array (GPL2025)	N = 8 (4/4 drought)
	GSE23211	Affymetrix Rice Genome Array (GPL2025)	N = 12 (6/6 drought)
	GSE26280	Affymetrix Rice Genome Array (GPL2025)	N = 36 (18/18 drought)
盐 Salt	GSE6901	Affymetrix Rice Genome Array (GPL2025)	N = 6 (3/3 salt)
盐 Salt	GSE14403	Affymetrix Rice Genome Array (GPL2025)	N = 18 (9/9 salt)
	GSE16108	Affymetrix Rice Genome Array (GPL2025)	N = 8 (4/4 salt)

新窗口打开|下载CSV

1.1.2 水稻转录组数据的获取及预处理水稻转录组RNA-seq数据来源于NCBI的SRA (sequence read archive)数据库(Illumina平台), 干旱胁迫相关有SRR7054176-83、SRR3051740-45、SRR3051752- 57共20个runs, 盐胁迫相关有ERR266221-38、SRR3647326-31共24个runs (选用李旭凯等^[10]所用数据, 详见附表2)。数据预处理过程如图2所示。首先利用fasterq-dump (v2.10.7)工具将下载的SRA格式数据转换为fastq格式序列文件, 并利用FastQC (v0.11.9)^[19]软件对原始测序数据进行质量评估; 接着利用fastp (0.20.1)^[20]软件做质量控制, 得到clean data; 然后根据MSU Rice Genome Annotation Project数据库(http://rice.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_7.0/all.dir/)的水稻参考基因组和注释信息, 利用Hisat2 (v2.2.10)软件对clean data进行序列比对; 随后利用Samtools (v0.1.19)软件将SAM文件转换为BAM文件并重新排序后, 用featureCounts (v2.0.1)^[21]软件得到每个基因在各个样本中的原始reads计数; 本研究使用R包DESeq2^[22]获取RNA-seq数据标准化后的基因表达量用于后续分析。

Table S2
附表2
附表2来源于NCBI的RNA-seq数据(RNA-seq)
Table S2RNA-seq data from NCBI

样品编号 Library name	样品描述 Library description	样品类型 Library type	单/双端 Library layout	水稻品种 Rice genotype
SRR3647331	1-leaf_34_day_Salt	RNAseq	Single	Nipponbare
SRR3647329	1-leaf_34_day_Salt	RNAseq	Single	Nipponbare
SRR3647327	1-leaf_34_day_Salt	RNAseq	Single	Nipponbare
SRR3647330	1-leaf_34_day_Control	RNAseq	Single	Nipponbare
SRR3647328	1-leaf_34_day_Control	RNAseq	Single	Nipponbare
SRR3647326	1-leaf_34_day_Control	RNAseq	Single	Nipponbare
ERR266228	1-Seedling_shoots_2_weeks_1h_Control	RNAseq	Single	Nipponbare
ERR266233	1-Seedling_shoots_2_weeks_1h_Control	RNAseq	Single	Nipponbare
ERR266230	1-Seedling_shoots_2_weeks_1h_Control	RNAseq	Single	Nipponbare
ERR266225	1-Seedling_shoots_2_weeks_24h_Control	RNAseq	Single	Nipponbare
ERR266234	1-Seedling_shoots_2_weeks_24h_Control	RNAseq	Single	Nipponbare
ERR266232	1-Seedling_shoots_2_weeks_24h_Control	RNAseq	Single	Nipponbare
ERR266229	1-Seedling_shoots_2_weeks_5h_Control	RNAseq	Single	Nipponbare
ERR266223	1-Seedling_shoots_2_weeks_5h_Control	RNAseq	Single	Nipponbare
ERR266222	1-Seedling_shoots_2_weeks_5h_Control	RNAseq	Single	Nipponbare
ERR266237	1-Seedling_shoots_2_weeks_1h_Salt	RNAseq	Single	Nipponbare
ERR266236	1-Seedling_shoots_2_weeks_1h_Salt	RNAseq	Single	Nipponbare
ERR266235	1-Seedling_shoots_2_weeks_1h_Salt	RNAseq	Single	Nipponbare
ERR266238	1-Seedling_shoots_2_weeks_24h_Salt	RNAseq	Single	Nipponbare
ERR266226	1-Seedling_shoots_2_weeks_24h_Salt	RNAseq	Single	Nipponbare
ERR266231	1-Seedling_shoots_2_weeks_24h_Salt	RNAseq	Single	Nipponbare
ERR266227	1-Seedling_shoots_2_weeks_5h_Salt	RNAseq	Single	Nipponbare
ERR266224	1-Seedling_shoots_2_weeks_5h_Salt	RNAseq	Single	Nipponbare
ERR266221	1-Seedling_shoots_2_weeks_5h_Salt	RNAseq	Single	Nipponbare
SRR7054183	2-Inflorescence_Control	RNAseq	Paired	Nipponbare
SRR7054182	2-Inflorescence_Control	RNAseq	Paired	Nipponbare
SRR7054181	2-Inflorescence_Control	RNAseq	Paired	Nipponbare
SRR7054180	2-Inflorescence_Control	RNAseq	Paired	Nipponbare
SRR7054179	2-Inflorescence_Drought	RNAseq	Paired	Nipponbare
SRR7054178	2-Inflorescence_Drought	RNAseq	Paired	Nipponbare
SRR7054177	2-Inflorescence_Drought	RNAseq	Paired	Nipponbare
SRR7054176	2-Inflorescence_Drought	RNAseq	Paired	Nipponbare
SRR3051752	Drought stress_rep1	RNAseq	Paired	Nipponbare
SRR3051753	Drought stress_rep2	RNAseq	Paired	Nipponbare
SRR3051754	Drought stress_rep3	RNAseq	Paired	Nipponbare
SRR3051755	Well-water_rep1	RNAseq	Paired	Nipponbare
SRR3051756	Well-water_rep2	RNAseq	Paired	Nipponbare
SRR3051757	Well-water_rep3	RNAseq	Paired	Nipponbare
SRR3051740	Well-water_rep1	RNAseq	Paired	Nipponbare
SRR3051741	Well-water_rep2	RNAseq	Paired	Nipponbare
SRR3051742	Well-water_rep3	RNAseq	Paired	Nipponbare
SRR3051743	Drought stress_rep1	RNAseq	Paired	Nipponbare
SRR3051744	Drought stress_rep2	RNAseq	Paired	Nipponbare
SRR3051745	Drought stress_rep3	RNAseq	Paired	Nipponbare

样本ERR266226为离群样本, 且考虑正负样本均衡, 后期处理丢弃该样本及其对照组ERR266225。
The sample ERR266226 is an outlier sample, and considering the balance of positive and negative samples, post-processing discards this sample and its control group ERR266225.

新窗口打开|下载CSV

图2

新窗口打开|下载原图ZIP|生成PPT
图2水稻RNA-seq数据处理流程

Fig. 2Process of rice RNA-seq data processing

经过上述对同一平台同一胁迫相关的多个数据集的数据融合, 共获4个水稻数据集: 干旱胁迫相关的基因芯片数据集D_affy和RNA-seq数据集D_rnaseq, 盐胁迫相关的基因芯片数据集S_affy和RNA-seq数据集S_rnaseq, 数据详见表1。

Table 1
表1
表1水稻数据集
Table 1Data set of rice

数据集 Data set	基因数 No. of genes	总样本数 Total samples	对照组样本数 Control samples	胁迫组样本数 Stress samples
干旱芯片数据 Drought stress-related affymetrix dataset (D_affy)	27,344	62	31	31
盐芯片数据 Salt stress-related affymetrix dataset (S_affy)	27,344	32	16	16
干旱RNA-seq数据 Drought stress-related RNA-seq dataset (D_rnaseq)	29,828	20	10	10
盐RNA-seq数据 Salt stress-related RNA-seq dataset (S_rnaseq)	28,425	22	11	11

新窗口打开|下载CSV

1.2 基于WGCNA-P和WGCNA-MIC的共表达网络分析

数据经预处理后仍然包含2万多个基因(表1), 考虑到直接进行共表达网络分析计算量过大, 本研究采用前文提及的最大信息系数MIC进行基因初筛。分别计算各数据中基因与表型之间的MIC值, MIC值越高, 意味着该基因与表型相关性越大, 我们选取MIC值较高的前30%基因用于后续的共表达网络分析。

本研究中, 经典的加权基因共表达网络WGCNA-P构建直接利用R语言中的WGCNA包提供的一系列函数实现, 而改进的WGCNA-MIC法则基于WGCNA包中的相关函数自编代码实现(代码见附件)。二者构建的主要步骤如下:

(1)计算相似矩阵${{\left( {{S}_{ij}} \right)}^{\text{unsigned}}}=\left| \text{cor}\left( i,j \right) \right|$基于WGCNA-P方法中相似矩阵中的元素由基因i和基因j之间的Pearson线性相关系数组成; 而基于WGCNA-MIC方法中相似矩阵中的元素由基因i和基因j之间的MIC非线性相关系数组成, 即S_ij=MIC(i, j);

(2)定义邻接矩阵${{\text{a}}_{ij}}={{\left| {{S}_{ij}} \right|}^{\beta }}$, 即对相似性进行幂律运算, 且为使得网络中基因间的连接服从无尺度分布, 根据无尺度网络模型指数${{R}^{2}}$选择软阈值$\beta $;

(3)构建拓扑重叠矩阵$\text{TO}{{\text{M}}_{ij}}$;

(4)计算距离矩阵$\text{disTOM}=1-\text{TO}{{\text{M}}_{ij}}$, 构建层次聚类树, 并利用动态剪枝算法获得基因模块, 模块最小基因数设为30。接着对相似模块进行合并, 合并阈值为0.2 (cutHeight=0.2)。

1.3 表型相关基因模块识别

为识别网络中的与表型相关的显著模块, 通常有以下2种方法:

(1)计算基因模块特征基因(module eigengenes, MEs)与表型的相关系数, 设为ME, 其中某一模块的第一主成分被定义为该模块的特征基因。

(2)计算模块的显著性系数(module significance, MS), 模块显著性MS是该模块内所有基因的显著性(Gene Significance, GS)的均值^[9], GS为基因与表型性状的相关系数绝对值。某模块的ME和MS值越大, 与表型越相关。本研究中, WGCNA-P方法所有涉及相关系数的计算均采用皮尔逊相关系数, 而WGCNA-MIC方法中则均用最大信息系数MIC。综合考虑ME、MS值, 模块数及所选模块的代表性, 本文对模块数小于10的选择1个显著模块、大于等于10且小于15的选择2个显著模块、大于15的选择4个显著模块。

1.4 Hub基因选择

利用网络中连接度高的枢纽节点来确定基因的优先级, 是一种理解和解释网络和整体生物复杂性的简便方法^[23]。Hub基因是依据基因与表型性状之间的相关性GS值、基因与其所在模块特征基因间的相关性MM值来选取。对同一胁迫的2个不同平台数据分别基于WGCNA-P和WGCNA-MIC可获得该胁迫相关的4个Hub基因子数据集, 对其进行元分析, 取并集, 可获得该胁迫相关的Hub基因总集。

1.5 Hub基因的预测性能

支持向量机(support vector machine, SVM)提供了一种高效分两类或两类以上数据的方法^[24], 为验证Hub基因选择的合理性, 本研究基于干旱胁迫和盐胁迫的8个Hub基因子集及最终的2个Hub基因总集依次构建SVM模型对表型进行分类预测。通过5次5折交叉验证进行测试, 最终以平均精度作为最后的预测结果。

1.6 Hub基因功能分析

1.6.1 GO富集及文献报道分析利用AgriGo (http://systemsbiology.cau.edu.cn/agriGOv2/index.php)^[25]富集分析工具对Hub基因进行GO富集分析。从国家水稻数据中心(http://www.ricedata.cn/)的ontology系统分别以检索条件“干旱”、“盐”进行检索, 并获取到文献已报道的250个干旱胁迫相关基因和363个盐胁迫相关基因。随后分别分析Hub基因总集中已报道基因情况, 并结合结果进一步挖掘可能的相关基因。

1.6.2 蛋白质互作网络构建与分析利用STRING和Cytoscape工具构建Hub基因的蛋白互作网络。将Hub基因导入STRING (v11.0)^[26]蛋白互作

在线分析工具(https://string-db.org/)构建蛋白质互作网络, 采用默认设置, 获得并导出蛋白互作数据。利用Cytoscape (v3.7.1)^[27]工具提取已报道胁迫相关Hub基因及其相关基因的子网络进行可视化分析, 每一个基因由网络中的一个节点表示, 相互连接的2个基因之间存在着某种关系。

2 结果与分析

2.1 WGCNA-P、WGCNA-MIC及显著模块识别

如图3所示, 干旱胁迫基因芯片数据D_affy基于WGCNA-P进行基因共表达网络分析时, 动态剪切得到35个基因模块, 合并后得到23个模块; 基于WGCNA-MIC方法分析, 30个模块合并后得到10个模块。纵坐标的不同颜色代表不同的模块, 各模块与干旱胁迫之间的相关性及模块显著性详见图3, 基于WGCNA-P识别的显著模块及模块内基因数分别为brown (833)、red (383)、darkgrey (106)、purple (260) 四个模块共1582个基因, 而基于WGCNA- MIC识别的模块为darkturquoise (1114)和midnightblue (265)两个模块共1379个基因。

图3

新窗口打开|下载原图ZIP|生成PPT
图3D_affy数据WGCNA网络分析结果

A、B: 分别基于WGCNA-P和WGCNA-MIC方法的基因聚类树和模块划分, Dynamic Tree Cut表示由原始计算划分的模块, Merged dynamic表示合并后的结果; C、D: 分别基于WGCNA-P和WGCNA-MIC方法的各模块的ME和MS值, 红色表示模块与干旱胁迫正相关, 蓝色表示模块与干旱胁迫负相关。
Fig. 3WGCNA network analysis of D_affy data

A, B: the gene clustering tree and module division based on WGCNA-P and WGCNA-MIC methods, respectively; the Dynamic Tree Cut represents the module divided by the original calculation, the Merged dynamic represents the merged result; C, D: the ME and MS values of each module based on WGCNA-P and WGCNA-MIC methods, respectively; red block indicates that the module is positively correlated with drought stress, and blue indicates that the module is negatively correlated with drought stress.

如图4所示, 干旱胁迫RNA_seq数据D_rnaseq基于WGCNA-P和WGCNA-MIC方法分别得到13个和7个模块, 且前者分别选取了saddlebrown (983)、darkorange (395)两个模块共1378个基因, 而后者则选取magenta (4089)模块以用于后续分析。

图4

新窗口打开|下载原图ZIP|生成PPT
图4D_rnaseq数据WGCNA网络分析结果

A、B: 分别基于WGCNA-P和WGCNA-MIC方法的基因聚类树和模块划分, Dynamic Tree Cut表示由原始计算划分的模块, Merged dynamic表示合并后的结果; C、D: 分别基于WGCNA-P和WGCNA-MIC方法的各模块的ME和MS值, 红色表示模块与干旱胁迫正相关, 蓝色表示模块与干旱胁迫负相关。
Fig. 4WGCNA network analysis of D_rnaseq data

A, B: the gene clustering tree and module division based on WGCNA-P and WGCNA-MIC methods, respectively; the Dynamic Tree Cut represents the module divided by the original calculation, the Merged dynamic represents the merged result; C, D: the ME and MS values of each module based on WGCNA-P and WGCNA-MIC methods, respectively; red block means that the module is positively correlated with drought stress, and blue indicates that the module is negatively correlated with drought stress.

盐胁迫基因芯片数据(附图1), 使用WGCNA-P方法时, 动态剪切得到20个模块, 经合并后得到17个模块; 使用WGCNA-MIC方法时, 43个模块合后得到40个模块。基于WGCNA-P方法最终选取的模块及模块内基因数分别magenta (213)、red (331)、purple (199)、pink (803)四个模块共1546个基因, 而基于WGCNA-MIC方法则分别选取了turquoise (2818)、darkturquoise (74)、red (210)和brown (409)四个模块共3511个基因以用于后续分析。盐胁迫的RNA-seq数据(附图2), 基于WGCNA-P和WGCNA- MIC网络最终分别得到27个和19个模块, 且前者分别选取了brown (1148)、plum1 (73)、darkgreen (183)、magenta (351)四个模块共1755个基因, 而后者则分别选取了pink (308)、lightcyan (42)、darkgreen (130)和darkturquoise (193)四个模块共673个基因以用于后续分析。

附图1

新窗口打开|下载原图ZIP|生成PPT
附图1S_affy数据WGCNA网络分析结果

A、B: 分别基于WGCNA-P和WGCNA-MIC方法的基因聚类树和模块划分, Dynamic Tree Cut表示由原始计算划分的模块, Merged dynamic表示合并后的结果; C、D: 分别基于WGCNA-P和WGCNA-MIC方法的各模块的ME和MS值, 红色表示模块与盐胁迫正相关, 蓝色表示模块与盐胁迫负相关。
Fig. S1WGCNA network analysis of S_affy

A, B: gene clustering tree and module division of based on WGCNA-P and WGCNA-MIC methods respectively; the Dynamic Tree Cut represents the module divided by the original calculation, the merged dynamic represents the merged result; C, D: the ME and MS values of each module based on WGCNA-P and WGCNA-MIC methods respectively; red indicates that the module is positively correlated with salt stress, and blue indicates that the module is negatively correlated with salt stress.

附图2

新窗口打开|下载原图ZIP|生成PPT
附图2S_rnaseq数据WGCNA网络分析结果

A、B: 分别基于WGCNA-P和WGCNA-MIC法的基因聚类树和模块划分, Dynamic Tree Cut表示由原始计算划分的模块, Merged dynamic表示合并后的结果; C、D: 分别基于WGCNA-P和WGCNA-MIC法的各模块的ME和MS值, 红色表示模块与盐胁迫正相关, 蓝色表示模块与盐胁迫负相关。
Fig. S2WGCNA network analysis of S_rnaseq

A, B: gene clustering tree and module division of based on WGCNA-P and WGCNA-MIC methods respectively; the Dynamic Tree Cut represents the module divided by the original calculation, the merged dynamic represents the merged result; C, D: the ME and MS values of each module based on WGCNA-P and WGCNA-MIC methods respectively; red indicates that the module is positively correlated with salt stress, and blue indicates that the module is negatively correlated with salt stress.

2.2 Hub基因的预测性能

本研究中, 干旱胁迫相关Hub基因挑选阈值设为GS>0.4且MM>0.83, 盐胁迫相关Hub基因筛选阈值设为GS>0.3且MM>0.75。4个数据集分别基于2种网络分析方法共获得8个Hub基因子集(D_affy_ P、D_affy_MIC、D_rnaseq_P、D_rnaseq_MIC和S_affy_P、S_affy_MIC、S_rnaseq_P、S_rnaseq_MIC), 对基因子集元分析后得到干旱胁迫相关Hub基因总集D_meta_hub和盐胁迫相关Hub基因总集S_meta_ hub。基于各Hub基因集对表型的SVM分类精度如表2所示, Hub基因的预测性能整体表现优异, 其中, 基于WGCNA-MIC方法获取的Hub基因, 较之基于WGCNA-P方法获取的Hub基因预测精度略高, 元分析后的Hub基因总集D_meta_hub和S_meta_hub, 在各数据集上的平均预测精度比各Hub基因子集的精度略高。结果表明, Hub基因与表型性状相关性强, WGCNA-MIC方法和元分析有效。

Table 2
表2
表2Hub基因的分类精度
Table 2Classification accuracy of Hub genes

胁迫 Stress	Hub基因集 Hub gene set	基因数目 Number of genes	数据集 Data set	平均精度 Average accuracy (%)
干旱Drought	D_affy_P	220	D_affy	100
	D_affy_MIC	104	D_affy	100
	D_rnaseq_P	738	D_rnaseq	90.0
	D_rnaseq_MIC	1634	D_rnaseq	91.0
	D_meta_hub	1936	D_affy	100
	D_meta_hub	1936	D_rnaseq	96.0
盐Salt	S_affy_P	470	S_affy	100
	S_affy_MIC	293	S_affy	100
	S_rnaseq_P	684	S_rnaseq	81.0
	S_rnaseq_MIC	331	S_rnaseq	84.6
	S_meta_hub	1504	S_affy	100
	S_meta_hub	1504	S_rnaseq	84.6

新窗口打开|下载CSV

2.3 GO功能富集分析

利用AgriGO在线功能富集分析工具, 分别对干旱/盐胁迫相关Hub基因集进行基因功能富集分析, 在生物学过程(biological process, BP)、分子功能(molecular function, MF)和细胞组分(cellular component, CC)三大分类中都显著富集到了多个相关GO通路。具体富集结果如表3所示。干旱胁迫相关富集结果显示, 生物学过程中, 显著富集到的通路, 包括应对刺激的通路: 内源性刺激响应(GO:0009719)、激素刺激响应(GO:0009725)和非生物刺激响应(GO:0009628)等; 参与特殊代谢物代谢过程的通路: 萜类化合物代谢过程(GO:0006721)等; 与干旱胁迫较为直接相关的通路: 对水的响应(GO:0009415)和渗透胁迫响应(GO:0006970)等。分子功能中, 显著富集到了与信号传导相关的通路: 受体活性(GO:0004872)、翻译因子活性与核酸结合(GO:0008135)等; 一些参与调控某些蛋白质酶相关的通路: 蛋白质酪氨酸激酶活性(GO:0004713)等。另外, 还有不少显著富集到细胞组分相关的通路: 薄膜(GO:0016020)等。盐胁迫相关富集结果显示, 生物学过程中, 显著富集的通路, 包括参与各种物质代谢过程的通路: 草酸代谢过程(GO:0043436)和有机酸代谢过程(GO:0006082)等; 响应胁迫相关的功能: 内源性激素的响应(GO:0009719)等; 参与光合作用: 光刺激响应(GO:0009416)等。分子功能中, 最显著富集到的通路是受体活性(GO:0004872)。细胞组分中富集到了很多与膜组分相关参与渗透作用的通路, 如薄膜(GO:0016020)和细胞质膜等(GO:0005886); 参与光合作用的组件: 叶绿体(GO:0009507)等。

Table 3
表3
表3Hub基因的GO富集部分分析结果
Table 3GO enrichment of Hub partial genes

胁迫 Stress	GO条目 GO term	基因数目 Number of genes	基因本体 Ontology	描述 Description	P值 P-value
干旱 Drought	GO:0010033	12	BP	对有机物的响应 Response to organic substance	9.00E-06
	GO:0009719	12	BP	内源性刺激响应 Response to endogenous stimulus	9.00E-06
	GO:0009725	12	BP	激素刺激响应 Response to hormone stimulus	9.00E-06
	GO:0006721	8	BP	萜类化合物代谢过程 Terpenoid metabolic process	2.80E-05
	GO:0007165	37	BP	信号传导Signal transduction	2.20E-05
	GO:0009628	15	BP	对非生物刺激的响应 Response to abiotic stimulus	0.00054
	GO:0006970	5	BP	渗透胁迫响应 Response to osmotic stress	0.0036
	GO:0009415	5	BP	对水的响应Response to water	0.006
	GO:0004872	22	MF	受体活性 Receptor activity	9.40E-13
	GO:0004713	21	MF	蛋白质酪氨酸激酶活性 Protein tyrosine kinase activity	9.00E-12
	GO:0004722	15	MF	蛋白丝氨酸/苏氨酸磷酸酶活性 Protein serine/threonine phosphatase activity	2.70E-05
	GO:0008135	12	MF	翻译因子活性, 核酸结合 Translation factor activity, Nucleic acid binding	0.0099
	GO:0044424	1120	CC	细胞内成分 Intracellular part	0
	GO:0005737	1011	CC	细胞质Cytoplasm	0
	GO:0016020	278	CC	薄膜 Membrane	3.10E-23
盐 Salt	GO:0070887	6	BP	细胞对化学刺激的响应 Cellular response to chemical stimulus	2.00E-06
	GO:0007275	13	BP	多细胞有机体的发育 Multicellular organismal development	5.30E-08
	GO:0043436	52	BP	草酸代谢过程 Oxoacid metabolic process	4.80E-06
	GO:0019752	52	BP	羧酸代谢过程 Carboxylic acid metabolic process	4.80E-06
	GO:0006082	52	BP	有机酸代谢过程 Organic acid metabolic process	5.10E-06
	GO:0010033	11	BP	对有机物的响应 Response to organic substance	4.70E-06
	GO:0009719	11	BP	内源性刺激响应 Response to endogenous stimulus	4.70E-06
	GO:0009416	6	BP	光刺激响应 Response to light stimulus	0.024
	GO:0004872	14	MF	受体活性 Receptor activity	5.70E-08
	GO:0008135	18	MF	翻译因子活性, 核酸结合 Translation factor activity, nucleic acid binding	1.00E-06
	GO:0003743	11	MF	翻译起始因子活性 Translation initiation factor activity	3.00E-05
	GO:0016874	32	MF	连接酶活性 Ligase activity	0.00064
	GO:0016020	225	CC	薄膜 Membrane	4.60E-21
	GO:0009507	20	CC	叶绿体 Chloroplast	4.30E-15
	GO:0005886	18	CC	细胞质膜 Plasma membrane	1.20E-09

BP: 生物学过程; MF: 分子功能; CC: 细胞组分。
BP: biological process; MF: molecular function; CC: cellular component.

新窗口打开|下载CSV

综上, 基于元分析获取的2种胁迫的Hub基因, 均富集到了内源性刺激响应(GO:0009719)、激素刺激响应(GO:0009725)和非生物刺激响应(GO:0009628)等胁迫响应的相关通路上。

2.4 文献报道情况分析

为了验证研究结果的可靠性, 根据从国家水稻数据中心获取到的已报道干旱和盐胁迫相关基因分析所选Hub基因的文献报道情况。本研究所选Hub基因中有已报道干旱胁迫相关基因31个和盐胁迫相关基因22个, 如表4所示。

Table 4
表4
表4已报道与胁迫相关的Hub基因
Table 4Hub genes related to stress have been reported

胁迫 Stress	基因编号RAP_locus	基因符号 Gene symbol	胁迫 Stress	基因编号 RAP_locus	基因符号 Gene symbol
干旱 Drought	Os05g0455500	OsP5CS; OsP5CS1; OsALDH18B1	干旱 Drought	Os03g0286900	OsRCI2-5
干旱 Drought	Os02g0766700	OsbZIP23	干旱 Drought	Os02g0149800	OsPP18
	Os08g0112700	OsMADS26		Os03g0267000	OsHSP18.0-CI; OsMSR3; OsSHSP1
	Os06g0130100	OsSIK1		Os05g0475400	OsAMTR1
	Os09g0552300	OsRPK1		Os08g0408500	OsERF48; OsDRAP1
	Os01g0867300	OsABF1; OsbZIP12		Os04g0676700	OsMYB6
	Os03g0125100	DSM2		Os12g0597500	OsUAH
	Os03g0745000	OsHsfA2a		Os06g0316000	Os2H16
	Os05g0542500	OsLEA3; OsLEA3-1		Os06g0612800	OsiSAP8
	Os02g0671100	MAIF1		Os04g0572400	OsDREB1E
	Os06g0211200	OsAREB1; OsbZIP46; OsABF2; ABL1		Os11g0126900	OsNAC10; ONAC122
	Os05g0437700	EDT1; OsbZIP40		Os11g0707600	OsGL1-11
	Os05g0569300	OsbZIP45		Os03g0805100	SQS
	Os08g0196700	OsNF-YA7; OsHAP2A		Os05g0213500	OsPYL/RCAR5; OsPYL5; OsPYL11
	Os03g0230300	OsSRO1c; BOC1		Os06g0598800	WSL1
	Os04g0541700	Oshox22
盐Salt	Os03g0348900	OsSRFP1; SDEL2	盐Salt	Os02g0121300	OsCYP2; LRT2
	Os04g0652400	OsSULTR3; 3;lpa		Os03g0272300	OsSDIR1
	Os03g0329900	OsPHR1		Os07g0129200	OsPR1a; OsSCP
	Os07g0187700	OsPHF1		Os05g0437700	EDT1; OsbZIP40
	Os02g0678200	OsSPX-MFS2; OsPSS2		Os08g0557000	OsPIMT1
	Os02g0325600	NIGT1		Os06g0693700	OsSIDP366
	Os01g0755700	NBIP1		Os04g0676700	OsMYB6
	Os10g0545700	OsACR2.1		Os01g0612700	OsLOL2; OsLOL5
	Os01g0869900	OsSAPK4; OSPDK		Os01g0948400	OsP5CR
	Os03g0719900	OsPTR8; OsNPF8.5		Os03g0319300	OsCam1-1
	Os09g0434500	OsBIERF1		Os05g0584200	OsLEA5

新窗口打开|下载CSV

2.5 Hub基因互作网络构建

利用在线分析工具STRING和Cytoscape软件挖掘Hub基因总集中蛋白互作关系, 重点关注与已报道胁迫相关的Hub基因互作情况。Hub基因总集中, 与2个及2个以上已报道Hub基因有较强关系, 即网络中节点度≥2且STRING中的combined_score≥0.9的Hub基因考虑作为胁迫候选基因可被进一步挖掘。如图5和图6, 图中红色节点表示前文得到的已报道胁迫相关Hub基因(不包括STRING库中未匹配到蛋白质的基因和无相关蛋白的基因), 节点越大, 表示与之相关的基因越多, 线条越粗且颜色越暗, 表示基因之间关系越强。最终找到了与已报道Hub基因存在蛋白互作关系的干旱胁迫候选基因11个(图5中橙色节点), 盐胁迫候选基因5个(图6中橙色节点), 详见表5。

图5

新窗口打开|下载原图ZIP|生成PPT
图5干旱胁迫相关基因互作网络

Fig. 5Gene interaction network of drought stress

图6

新窗口打开|下载原图ZIP|生成PPT
图6盐胁迫相关基因互作网络

Fig. 6Gene interaction network of salt stress

Table 5
表5
表5候选基因在STRING中的注释
Table 5Candidate genes annotation in STRING

胁迫 Stress	候选基因 Candidate gene	STRING中名称 Name in STRING	注释 Annotation
干旱Drought	Os01g0733200	HSF11	热应激转录因子C-1b; 与热休克启动子元件(HSE)的DNA特异性结合的转录调控因子。 Heat stress transcription factor C-1b; transcriptional regulator that specifically binds DNA of heat shock promoter elements (HSE).
	Os07g0178600	HSF5	热应激转录因子A-2b; 转录调控因子, 可特异性结合热休克启动子元件(HSE)的DNA; 属于HSF家族, A类亚科。 Heat stress transcription factor A-2b; transcriptional regulator that specifically binds DNA of heat shock promoter elements (HSE); belongs to the HSF family. Class A subfamily.
	Os09g0526600	HSFB2C	热应激转录因子B-2c; 转录调控因子, 可特异性结合热休克启动子元件(HSE)的DNA; 属于HSF家族, B类亚科。 Heat stress transcription factor B-2c; transcriptional regulator that specifically binds DNA of heat shock promoter elements (HSE); belongs to the HSF family. Class B subfamily.
	Os01g0583100	OS01T0583100-01	可能的蛋白质磷酸酶2C 6; 属于PP2C家族。 Probable protein phosphatase 2C 6; belongs to the PP2C family.
	Os03g0231700	OS03T0231700-02	Os03g0231700蛋白; 角鲨烯单加氧酶, 假定表达; cDNA克隆: J033045D18, 完整插入序列。 Os03g0231700 protein; Squalene monooxygenase, putative, expressed; cDNA clone: J033045D18, full insert sequence.
	Os03g0376100	OS03T0376100-01	Os03g0376100蛋白。 Os03g0376100 protein.
胁迫 Stress	候选基因 Candidate gene	STRING中名称 Name in STRING	注释 Annotation
	Os04g0107900	OS04T0107900-02	Os04g0107900蛋白。 Os04g0107900 protein.
	Os01g0840100	OsJ_04024	cDNA克隆: J100050G20, 完整插入序列; 70 kD热激蛋白; Os01g0840100蛋白; 假定的HSP70; 未表征的蛋白质。 cDNA, clone: J100050G20, full insert sequence; 70 kD heat shock protein; Os01g0840100 protein; Putative HSP70; uncharacterized protein.
	Os03g0277300	OsJ_10337	热休克同源70 kD蛋白, 假定表达。 Heat shock cognate 70 kD protein, putative, expressed.
	Os05g0460000	OsJ_18811	Os05g0460000蛋白; 推定的hsp70; cDNA克隆: J090096I11, 完整插入序列; 属于热休克蛋白70家族。 Os05g0460000 protein; Putative hsp70; cDNA, clone: J090096I11, full insert sequence; Belongs to the heat shock protein 70 family.
	Os06g0110200	OsJ_19858	cDNA克隆: 002-135-D09, 完整插入序列; Os06g0110200蛋白; 假定的未表征蛋白OSJNBa0004I20.22; 假定的未表征蛋白P0514G12.46。 cDNA clone: 002-135-D09, full insert sequence; Os06g0110200 protein; putative uncharacterized protein OSJNBa0004I20.22; putative uncharacterized protein P0514G12.46.
盐 Salt	Os06g0727200	CATB	过氧化氢酶同工酶B; 几乎发生在所有有氧呼吸的生物中, 并保护细胞免受过氧化氢的毒性作用。 Catalase isozyme B; occurs in almost all aerobically respiring organisms and serves to protect cells from the toxic effects of hydrogen peroxide.
	Os12g0502300	CYCA2-1	Cyclin-A2-1; 属于细胞周期蛋白家族。Cyclin AB亚家族。 Cyclin-A2-1; belongs to the cyclin family. Cyclin AB subfamily.
	Os03g0821100	OsJ_13143	热休克同源70 kD蛋白2, 假定表达; 热休克蛋白同源物70; Os03g0821100蛋白; cDNA克隆: J023030D03, 完整插入序列。 Heat shock cognate 70 kD protein 2, putative, expressed; heat shock protein cognate 70; Os03g0821100 protein; cDNA clone:J023030D03, full insert sequence.
	Os10g0491801	OsJ_31995	假定泛素/核糖体蛋白S27a融合蛋白;泛素融合蛋白, 假定表达。 Putative ubiquitin/ribosomal protein S27a fusion protein; ubiquitin fusion protein, putative, expressed.
	Os02g0775200	RFC3	复制因子C亚基3; 可能参与DNA复制, 从而调节细胞增殖。 Replication factor C subunit 3; may be involved in DNA replication and thus regulate cell proliferation.

新窗口打开|下载CSV

3 讨论

植物为应对干旱胁迫环境, 在生化、细胞和分子等水平上进化出了很多机制^[28], 需要改变基因表达来激活促进耐旱性的代谢过程, 这包括特殊代谢物的合成与积累, 并涉及到物种和基因型特异性的酚类化合物、类黄酮、萜类化合物和含氮化合物的产生^[29]。盐胁迫威胁作物生长主要体现在渗透和氧化2个方面, 这不仅会导致叶片脱落、根芽坏死等不良症状的发生, 而且潜在地延迟了光合作用、植物激素功能、代谢途径和基因/蛋白质功能等生理活动^[30]。

本研究以水稻干旱和盐胁迫相关的Affymetrix基因芯片和RNA-seq两种不同平台的数据为研究对象, 基于WGCNA-P和WGCNA-MIC对其进行了胁迫相关Hub基因的挖掘。从Hub基因预测性能来看, 各Hub基因集的预测精度均达80%以上, 预测性能整体较好。从GO富集分析和文献报道来看, 一方面2种胁迫的Hub基因集都富集到了内源性刺激响应(GO:0009719)、激素刺激响应(GO:0009725)和非生物刺激响应(GO:0009628)等干旱和盐响应相关通路; 另一方面也找到了一些已报道的干旱/盐胁迫相关Hub基因, EDT1 (Os05g0437700)和OsMYB6 (Os04g0 676700)既是已报道干旱胁迫相关Hub基因, 又是已报道盐胁迫相关Hub基因。通过总结部分已报道基因的文献发现, 某个胁迫可能由多个基因协同互作参与调控, 某个基因也可能参与了多个非生物胁迫。比如, 31个已报道干旱胁迫响应基因中, 包括OsP5CS (Os05g0455500)^[31]的表达受高盐、干旱、冷胁迫和ABA处理的诱导; OsMADS26 (Os08g0112 700)^[32]是水稻响应多种胁迫的调控中心; OsbZIP23 (Os02g0766700)^[33]增强了水稻的抗旱耐盐性和对ABA的敏感性; OsSIK1 (Os06g0130100)^[34]在水稻耐盐和耐旱过程中起重要作用; OsRPK1 (Os09g0552 300)^[35]在盐胁迫下表达水平增加, 其表达也受寒冷、干旱与脱落酸等因素的诱导。22个已报道盐胁迫响应基因中, 包括OsSRFP1 (Os03g0348 900)^[36]负向调控水稻的耐盐性和耐低温性; 干旱和盐处理诱导OsPTR8 (Os03g0719900)^[37]的表达上调; OsSCP (Os07g0129200)^[38]在非生物胁迫应答中通过调控胁迫应答基因而发挥作用; 冷胁迫和盐胁迫会导致OsPIMT1 (Os08g0557000)^[39]表达量增加2倍; OsLEA5 (Os05g0584200)^[40]与多种非生物胁迫抗性相关等。同时, 与李旭凯等^[10]、Zhu等^[11]和Lv等^[12]的研究结果相比, 我们挖掘到的Hub基因中包含了部分已被广泛报道的水稻干旱胁迫和盐胁迫响应相关的转录因子, 如干旱胁迫相关的bZIP转录因子家族(Os02g0766700、Os06g0211200、Os05g0569300)、MYB转录因子家族(Os04g0676700)、NAC转录因子家族(Os11g0126900)和HSF转录因子家族(Os03g0745000)等; 盐胁迫相关的bZIP转录因子家族(Os05g0437700)和MYB转录因子家族(Os04g06 76700)等。最后, 通过分析Hub基因总集中已报道与胁迫相关的Hub基因及其相关基因之间的互作网络, 进一步挖掘到了与干旱或盐胁迫相关较为紧密的候选基因。

综上, 利用元分析的思路对水稻多平台基因表达数据进行整合分析, 可挖掘到水稻干旱和盐胁迫的关键基因, 对农作物非生物胁迫响应的基因挖掘具有一定的参考价值。STRING分析时, 参数阈值的设置不同, 所获候选基因的数量也会有所不同, 本研究中利用combined_score≥0.9获得的候选基因, 可根据实际情况适当调整阈值, 并有待进一步利用实时荧光定量PCR (RT-qPCR)验证。

4 结论

对多平台数据, 通过加权基因共表达网络分析、元分析和蛋白互作网络分析, 最终获得水稻干旱胁迫和盐胁迫相关的Hub基因分别为1936个和1504个, 其中文献已报道的干旱胁迫和盐胁迫相关Hub基因分别是31个和22个, 预测得到的干旱胁迫和盐胁迫候选基因分别是11个和5个。水稻其他非生物胁迫(如冷胁迫、高温胁迫等)多平台数据数据结构及其实验原理与干旱和盐胁迫类似, 故此方法可推广至其他非生物胁迫相关基因挖掘。本研究为充分利用多平台数据挖掘水稻非生物胁迫相关基因提供了新的思路, 也为进一步研究抗逆性水稻品种提供了参考。

参考文献原文顺序
文献年度倒序
文中引用次数倒序
被引期刊影响因子

[1]

Gong Z

, Xiong L

, Shi H

, Yang S

, Herrera-Estrella

L R

, Xu

G H

, Chao

D Y

, Li

J R

, Wang

P Y

, Qin

, Li

J G

, Ding

Y L

, Shi

Y T

, Wang

, Yang

Y Q

, Guo

, Zhu

J K

. Plant abiotic stress response and nutrient use efficiency
Sci China (Life Sci Edn), 2020, 63:635-674.

[本文引用: 1]

[2]

Hossain M

, Bassel G

, Pritchard

, Sharma G

, Ford-Lloyd

B V

. Trait specific expression profiling of salt stress responsive genes in diverse rice genotypes as determined by modified significance analysis of microarrays
Front Plant Sci, 2016, 7:567.

[本文引用: 1]

[3]

Zhou

, Yang

, Cui F

, Zhang F

, Luo X

, Xie J

. Transcriptome analysis of salt stress responsiveness in the seedlings of Dongxiang wild rice (Oryza rufipogon Griff.)
PLoS One, 2016, 11(1):e0146242.

DOI URL [本文引用: 1]

[4]

Song

, Das

, Yang

, Chen M

, Tian

, Cheng C

, Sun

, Xu W

, Zhang J

. Genome-wide transcriptome analysis of roots in two rice varieties in response to alternate wetting and drying irrigation
Crop J, 2020, 8:586-601.

DOI URL [本文引用: 1]

[5]

Kaur

, Iquebal M

, Jaiswal

, Tandon

, Sundaram R

, Gautam R

, Suresh K

, Rai

, Kumar

. A meta-analysis of potential candidate genes associated with salinity stress tolerance in rice
Agric Gene, 2016, 1:126-134.

[本文引用: 1]

[6]

Serin E A

, Nijveen

, Hilhorst H W

, Ligterink

. Learning from co-expression networks: possibilities and challenges
Front Plant Sci, 2016, 7:444.

[本文引用: 1]

[7]

Zhou G

, Soufan

, Ewald

, Hancock

R E W

, Basu

, Xia

J G

. NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis
Nucleic Acids Res, 2019, 47(W1):W234-W241.

DOI URL [本文引用: 1]

[8]

王凯莉, 张礼, 刘学军. 融合多平台表达数据的转录组差异表达分析
计算机学报, 2018, 41:1195-1210.

[本文引用: 1]

Wang K

, Zhang

, Liu X

. Differential expression analysis based on integrating transcriptome expression data from multiple platforms
Chin J Comp, 2018, 41:1195-1210 (in Chinese with English abstract).

[本文引用: 1]

[9]

Cheng Y

, Li

, Qin Z

, Li

, Qi

. Identification of castration-resistant prostate cancer-related hub genes using weighted gene co-expression network analysis
J Cell Mol Med, 2020, 24:1-12.

DOI URL [本文引用: 2]

[10]

李旭凯, 李任建, 张宝俊. 利用WGCNA鉴定非生物胁迫相关基因共表达网络
作物学报, 2019, 45:1349-1364.

[本文引用: 3]

Li X

, Li R

, Zhang B

. Identification of rice stress-related gene co-expression modules by WGCNA
Acta Agron Sin, 2019, 45:1349-1364 (in Chinese with English abstract).

[本文引用: 3]

[11]

Zhu M

, Xie H

, Wei X

, Dossa

, Yu Y

, Hui S

, Tang G

, Zeng X

, Yu Y

, Hu P

, Wang J

. WGCNA analysis of salt-responsive core transcriptome identifies novel hub genes in rice
Genes, 2019, 10:719.

DOI URL [本文引用: 2]

[12]

Lv Y

, Xu

, Dossa

, Zhou

, Zhu M

, Xie H

, Tang S

, Yu Y

, Guo X

, Zhou

. Identification of putative drought-responsive genes in rice using gene co-expression analysis
Bioinformation, 2019, 15:480-488.

DOI URL [本文引用: 2]

[13]

Hopper D

, Ghan

, Schlauch K

, Cramer G

. Transcriptomic network analyses of leaf dehydration responses identify highly connected ABA and ethylene signaling hubs in three grapevine species differing in drought tolerance
BMC Plant Biol, 2016, 16:118.

DOI URL [本文引用: 1]

[14]

秦天元, 孙超, 毕真真, 梁文君, 李鹏程, 张俊莲, 白江平. 基于WGCNA的马铃薯根系抗旱相关共表达模块鉴定和核心基因发掘
作物学报, 2020, 46:1033-1051.

[本文引用: 1]

Qin T

, Sun

, Bi Z

, Liang W

, Li P

, Zhang J

, Bai J

. Identification of drought-related co-expression modules and hub genes in potato roots based on WGCNA
Acta Agron Sin, 2020, 46:1033-1051 (in Chinese with English abstract).

[本文引用: 1]

[15]

Reshef D

, Reshef Y

, Finucane H

, Grossman S

, McVean

, Turnbaugh

P J

, Lander

E S

, Mitzenmacher

, Sabeti

P C

. Detecting novel associations in large data sets
Science, 2011, 334:1518-1524.

DOI PMID [本文引用: 1]

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

[16]

Britt C

, Weisburd D. Statistical

Power

. Handbook of Quantitative Criminology
New York: Springer, 2010. pp 313-32.

[本文引用: 1]

[17]

Durinck

, Spellman P

, Birney

, Huber

. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt
Nat Protoc, 2009, 4:1184-1191.

DOI URL [本文引用: 1]

[18]

Steffen

, Yves

, Arek

, Sean

, Bart D

, Alvis

, Wolfgang

. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis
Bioinformatics, 2005, 21:3439-3440.

PMID [本文引用: 1]

biomaRt is a new Bioconductor package that integrates BioMart data resources with data analysis software in Bioconductor. It can annotate a wide range of gene or gene product identifiers (e.g. Entrez-Gene and Affymetrix probe identifiers) with information such as gene symbol, chromosomal coordinates, Gene Ontology and OMIM annotation. Furthermore biomaRt enables retrieval of genomic sequences and single nucleotide polymorphism information, which can be used in data analysis. Fast and up-to-date data retrieval is possible as the package executes direct SQL queries to the BioMart databases (e.g. Ensembl). The biomaRt package provides a tight integration of large, public or locally installed BioMart databases with data analysis in Bioconductor creating a powerful environment for biological data mining.

[19]

Kroll K

, Mokaram N

, Pelletier A

, Frankhouser D

, Westphal M

, Stump P

, Stump C

, Bundschuh

, Blachly J

, Yan

. Quality control for RNA-Seq (QuaCRS): an integrated quality control pipeline
Cancer Inform, 2014, 13(S3):7-14.

[本文引用: 1]

[20]

Chen S

, Zhou Y

, Chen Y

, Gu

. Fastp: an ultra-fast all-in-one FASTQ preprocessor
Bioinformatics, 2018, 34:i884-i890.

DOI URL [本文引用: 1]

[21]

Liao

, Smyth G

, Shi

. FeatureCounts: an efficient general purpose program for assigning sequence reads to genomic features
Bioinformatics, 2014, 30:923-930.

DOI URL [本文引用: 1]

[22]

Love M

, Huber

, Anders

. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
Genom Biol, 2014, 15:31-46.

[本文引用: 1]

[23]

Smita

, Katiyar

, Pandey D

, Chinnusamy

, Archak

, Bansal K

. Identification of conserved drought stress responsive gene-network across tissues and developmental stages in rice
Bioinformation, 2013, 9:72-78.

DOI URL [本文引用: 1]

[24]

Shaik

, Ramakrishna

. Machine learning approaches distinguish multiple stress conditions using stress-responsive genes and identify candidate genes for broad resistance in rice
Plant Physiol, 2014, 164:481-495.

DOI PMID [本文引用: 1]

Abiotic and biotic stress responses are traditionally thought to be regulated by discrete signaling mechanisms. Recent experimental evidence revealed a more complex picture where these mechanisms are highly entangled and can have synergistic and antagonistic effects on each other. In this study, we identified shared stress-responsive genes between abiotic and biotic stresses in rice (Oryza sativa) by performing meta-analyses of microarray studies. About 70% of the 1,377 common differentially expressed genes showed conserved expression status, and the majority of the rest were down-regulated in abiotic stresses and up-regulated in biotic stresses. Using dimension reduction techniques, principal component analysis, and partial least squares discriminant analysis, we were able to segregate abiotic and biotic stresses into separate entities. The supervised machine learning model, recursive-support vector machine, could classify abiotic and biotic stresses with 100% accuracy using a subset of differentially expressed genes. Furthermore, using a random forests decision tree model, eight out of 10 stress conditions were classified with high accuracy. Comparison of genes contributing most to the accurate classification by partial least squares discriminant analysis, recursive-support vector machine, and random forests revealed 196 common genes with a dynamic range of expression levels in multiple stresses. Functional enrichment and coexpression network analysis revealed the different roles of transcription factors and genes responding to phytohormones or modulating hormone levels in the regulation of stress responses. We envisage the top-ranked genes identified in this study, which highly discriminate abiotic and biotic stresses, as key components to further our understanding of the inherently complex nature of multiple stress responses in plants.

[25]

Tian

, Liu

, Yan H

, You

, Yi

, Du

, Xu W

, Su

. AgriGO v2.0: a GO analysis toolkit for the agricultural community
Nucleic Acids Res, 2017, 45(W1):W122-W129.

DOI URL [本文引用: 1]

[26]

Szklarczyk

, Gable A

, Lyon

, Junge

, Wyder

, Huerta-Cepas

, Simonovic

, Doncheva N

, Morris J

, Bork

, Jensen L

, Mering C

. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
Nucleic Acids Res, 2019, 47(D1):D607-D613.

DOI [本文引用: 1]

Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.

[27]

Shannon

, Markiel

, Ozier

, Baliga N

, Wang J

, Ramage

, Amin

, Schwikowski

, Ideker

. Cytoscape: a software environment for integrated models of biomolecular interaction networks
Genom Res, 2003, 13:2498-2504.

DOI URL [本文引用: 1]

[28]

王胜昌, 涂海甫, 胡丹, 吴奈, 岑祥, 熊立仲. 水稻抗非生物逆境功能基因的发掘
生命科学, 2016, 28:1216-1229.

[本文引用: 1]

Wang S

, Tu H

, Hu

, Wu

, Cen

, Xiong L

. The exploitation of rice functional genes for abiotic stress
Chin Bull Life Sci, 2016, 28:1216-1229 (in Chinese with English abstract).

[本文引用: 1]

[29]

Zahedi S

, Karimi

, Venditti

. Plants adapted to arid areas: specialized metabolites
Nat Prod Res, 2019: 1-18.

[本文引用: 1]

[30]

Murad M

, Khan A

, Muneer

. Silicon in horticultural crops: cross-talk, signaling, and tolerance mechanism under salinity stress
Plants, 2020, 9:460.

DOI URL [本文引用: 1]

[31]

Sripinyowanich

, Klomsakul

, Boonburapong

, Bangyeekhun

, Asami

, Gu Y

, Buaboocha

, Chadchawan

. Exogenous ABA induces salt tolerance in indica rice (Oryza sativa L.): the role of OsP5CS1 and OsP5CR gene expression during salt stress
Environ Exp Bot, 2013, 86:94-105.

DOI URL [本文引用: 1]

[32]

Ngan K

, Pati P

, Richaud

, Parizot

, Bidzinski

, Mai C

, Bès

, Bourrié

, Meynard

, Beeckman

, Selvaraj M

, Manabu

, Genga A

, Brugidou

, Do V

, Guiderdoni

, Morel J

, Gantet

. OsMADS26 negatively regulates resistance to pathogens and drought tolerance in rice
Plant Physiol, 2015, 169:2935-2949.

[本文引用: 1]

[33]

Zong

, Tang

, Yang

, Peng

, Ma S

, Xu

, Li G

, Xiong L

. Feedback regulation of ABA signaling and biosynthesis by a bZIP transcription factor targets drought-resistance-related genes
Plant Physiol, 2016, 171:2810-2825.

DOI PMID [本文引用: 1]

The OsbZIP23 transcription factor has been characterized for its essential role in drought resistance in rice (Oryza sativa), but the mechanism is unknown. In this study, we first investigated the transcriptional activation of OsbZIP23. A homolog of SnRK2 protein kinase (SAPK2) was found to interact with and phosphorylate OsbZIP23 for its transcriptional activation. SAPK2 also interacted with OsPP2C49, an ABI1 homolog, which deactivated the SAPK2 to inhibit the transcriptional activation activity of OsbZIP23. Next, we performed genome-wide identification of OsbZIP23 targets by immunoprecipitation sequencing and RNA sequencing analyses in the OsbZIP23-overexpression, osbzip23 mutant, and wild-type rice under normal and drought stress conditions. OsbZIP23 directly regulates a large number of reported genes that function in stress response, hormone signaling, and developmental processes. Among these targets, we found that OsbZIP23 could positively regulate OsPP2C49, and overexpression of OsPP2C49 in rice resulted in significantly decreased sensitivity of the abscisic acid (ABA) response and rapid dehydration. Moreover, OsNCED4 (9-cis-epoxycarotenoid dioxygenase4), a key gene in ABA biosynthesis, was also positively regulated by OsbZIP23. Together, our results suggest that OsbZIP23 acts as a central regulator in ABA signaling and biosynthesis, and drought resistance in rice.© 2016 American Society of Plant Biologists. All Rights Reserved.

[34]

Ouyang S

, Liu Y

, Liu

, Lei

, He S

, Ma

, Zhang W

, Zhang J

, Chen S

. Receptor-like kinase OsSIK1 improves drought and salt stress tolerance in rice (Oryza sativa) plants
Plant J, 2010, 62:316-329.

DOI URL [本文引用: 1]

[35]

Cheng Y

, Qi Y

, Zhu

, Chen

, Wang

, Zhao

, Chen H

, Cui X

, Xu L

, Zhang

. New changes in the plasma-membrane-associated proteome of rice roots under salt stress
Proteomics, 2009, 9:3100-3114.

DOI URL [本文引用: 1]

[36]

Fang H

, Meng Q

, Xu J

, Tang H

, Tang S

, Zhang H

, Huang

. Knock-down of stress inducible OsSRFP1 encoding an E3 ubiquitin ligase with transcriptional activation activity confers abiotic stress tolerance through enhancing antioxidant protection in rice
Plant Mol Biol, 2015, 87:441-458.

DOI URL [本文引用: 1]

[37]

Ouyang

, Cai Z

, Xia K

, Wang Y

, Duan

, Zhang M

. Identification and analysis of eight peptide transporter homologs in rice
Plant Sci, 2010, 179:374-382.

DOI URL [本文引用: 1]

[38]

Kothari K

, Dansana P

, Jitender

, Tyagi A

. Rice stress associated protein 1 (OsSAP1) interacts with aminotransferase (OsAMTR1) and pathogenesis-related 1a protein (OsSCP) and regulates abiotic stress responses
Front Plant Sci, 2016, 7:1057.

[本文引用: 1]

[39]

Wei Y

, Xu H

, Diao L

, Zhu Y

, Xie H

, Cai Q

, Wu F

, Wang Z

, Zhang J

, Xie H

. Protein repair L-isoaspartyl methyl transferase 1 ( PIMT1) in rice improves seed longevity by preserving embryo vigor and viability
Plant Mol Biol, 2015, 89:475-492.

DOI URL [本文引用: 1]

[40]

, Tan L

, Hu Z

, Chen G

, Wang G

, Hu T

. Molecular characterization and functional analysis by heterologous expression in E. coli under diverse abiotic stresses for OsLEA5, the atypical hydrophobic LEA protein from Oryza sativa L
Mol Genet Genom, 2012, 287:39-54.

DOI URL [本文引用: 1]