限制性两阶段多位点全基因组关联分析法（RTM-GWAS）的特点、常见提问与应用前景

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

本站小编 Free考研考试/2021-12-26

盖钧镒, 贺建波南京农业大学大豆研究所/国家大豆改良中心/农业部大豆生物学与遗传育种重点实验室/作物遗传与种质创新国家重点实验室/江苏省现代作物生产协同创新中心,南京 210095

Major Characteristics, Often-Raised Queries and Potential Usefulness of the Restricted Two-Stage Multi-Locus Genome-Wide Association Analysis

GAI JunYi, HE JianBoSoybean Research Institute, Nanjing Agricultural University/National Center for Soybean Improvement/Key Laboratory of Biology and Genetic Improvement of Soybean (General), Ministry of Agriculture/State Key Laboratory for Crop Genetics and Germplasm Enhancement/Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing 210095

责任编辑: 李莉
收稿日期:2020-01-2接受日期:2020-02-15网络出版日期:2020-05-16

基金资助:

国家自然科学基金.31701447
国家作物育种重点研发计划.2017YFD0101500
国家作物育种重点研发计划.2017YFD0102002
****和创新团队发展计划.PCSIRT_17R55
教育部111项目.B08025
中央高校基本科研业务费项目.KYT201801
农业部国家大豆产业技术体系CARS-04.
江苏省优势学科建设工程专项.
江苏省JCIC-MCP项目.

Received:2020-01-2Accepted:2020-02-15Online:2020-05-16
作者简介 About authors
盖钧镒,E-mail：sri@njau.edu.cn。

摘要
限制性两阶段多位点全基因组关联分析方法（RTM-GWAS）是新建立的一种可以全面检测自然群体和双（多）亲衍生群体中具有不同复等位变异QTL体系的关联分析方法。本文介绍了提出RTM-GWAS的出发点及其两大主要特点,包括建立适合自然群体和和双（多）亲衍生群体特点的复等位变异标记和控制总体贡献率的多位点关联分析模型。一般读者和编者对RTM-GWAS的方法、原理,对复等位标记和多位点模型并无异议,提问与质疑主要分为两方面：一是RTM-GWAS检测到的QTL数量较多,大大多于单位点MLM模型所检出的QTL数目,怀疑增加的QTL是假阳性所致;另一是采用常规显著水准要求太低,不适于关联分析。本文对此做了严密释疑。最后介绍了关于RTM-GWAS的应用前景,包括遗传体系解析与重要基因克隆,双（多）亲杂交衍生群体遗传解析,群体遗传分化与进化和设计育种等方面。
关键词： 限制性两阶段多位点全基因关联分析;SNP连锁不平衡区段;复等位变异;多位点模型;QTL-allele矩阵;假阳性;模型显著性

Abstract

Restricted two-stage multi-locus genome-wide association analysis (RTM-GWAS) is a novel GWAS procedure which provides a way to identify the QTL system with various multiple alleles in natural and bi- or multi-parental derived populations. The major purposes and its two major characteristics of the RTM-GWAS procedure were presented, including the establishment of the SNPLDB markers with multiple alleles fitting the property of the natural and bi- or multi-parental derived populations and the establishment of multi-locus model GWAS procedure with the total genetic contribution controlled within heritability value. Generally, the readers and editors do not doubt about the methods and principles, the multiple allele markers and the multi-locus model, but have questions and queries on the large amount of detected QTLs many more than those from single locus MLM-GWAS procedure and on the general significance level without correction used in RTM-GWAS. These doubts were carefully and seriously explained and relieved. Furthermore, the potential usefulness of the RTM-GWAS procedure in genetic and evolutionary studies were summarized, including usefulness in relatively thorough identification of the QTL-allele system in populations and major gene finding and cloning, usefulness in relatively thorough identification of the QTL-allele system in bi-and multi-parental derived populations, usefulness in studies on population genetic differentiation and evolution and usefulness in breeding by genetic design.

Keywords：restricted two-stage multi-locus genome-wide association analysis (RTM-GWAS);SNP linkage disequilibrium block (SNPLDB);multiple alleles;multi-locus model;QTL-allele matrix;false positive;model significance

PDF (361KB)元数据多维度评价相关文章导出EndNote|Ris|Bibtex 收藏本文
本文引用格式
盖钧镒, 贺建波. 限制性两阶段多位点全基因组关联分析法（RTM-GWAS）的特点、常见提问与应用前景[J]. 中国农业科学, 2020, 53(9): 1699-1703 doi:10.3864/j.issn.0578-1752.2020.09.001
GAI JunYi, HE JianBo. Major Characteristics, Often-Raised Queries and Potential Usefulness of the Restricted Two-Stage Multi-Locus Genome-Wide Association Analysis[J]. Scientia Acricultura Sinica, 2020, 53(9): 1699-1703 doi:10.3864/j.issn.0578-1752.2020.09.001

为克服全基因组关联分析（genome-wide association study,GWAS）在遗传育种研究中的局限性,HE等^[1]提出了限制性两阶段多位点全基因关联分析（restricted two-stage multi-locus genome-wide association analysis,RTM-GWAS）方法以全面解析数量性状QTL-等位变异构成。目前,RTM-GWAS方法已应用于多个群体遗传研究^{[2,3,4,5,6,7,8,9]}。受编辑部邀约,本课题组将以RTM- GWAS在大豆遗传与种质资源中的研究为例,用以说明该方法的应用前景。该专题包括限制性两阶段多位点全基因组关联分析法在遗传育种中的应用^[10]、东北大豆种质群体百粒重QTL-等位变异的全基因组解析^[11]、RTM-GWAS方法应用于大豆RIL群体百粒重QTL检测的功效^[12]、大豆巢式关联作图群体蛋白质含量的遗传解析^[13]和大豆重组自交系群体异黄酮含量QTL连锁定位与关联定位的比较研究^[14]等5篇应用文章。除第一篇简单介绍RTM-GWAS的原理外,其余4篇介绍了在资源群体、双（多）亲衍生群体（RIL,NAM）QTL及其等位变异的定位与检测以及所获QTL-Allele数据在基因发掘和群体遗传研究中的应用。为便于读者了解RTM-GWAS方法及其应用,本文着重介绍提出RTM-GWAS的出发点及其两大主要特点、编审过程中对RTM-GWAS的质疑及辩解以及RTM-GWAS方法的应用前景。

1 提出RTM-GWAS的出发点及其两大主要特点

全基因组关联分析利用自然群体广泛存在的遗传变异,通过测验分子标记与表型间相关性来检测数量性状基因座（quantitative trait loci,QTL）,为全面解析数量性状遗传体系提供了有效手段。GWAS通常基于单核苷酸多态性（single nucleotide polymorphism,SNP）分子标记,单个SNP标记一般仅有2个等位变异,因此也只能完全拟合仅有2个等位变异的QTL。然而,自然群体（包括种质资源群体）长期经历多种环境的影响,同一基因座上会产生新的等位变异,形成复等位变异。多亲衍生群体因亲本间的异质性,也会有复等位变异出现。这与双亲分离群体（例如重组自交系群体）中每个QTL仅有2个等位变异不同。因此,利用SNP标记不能检测自然群体和多亲衍生群体中广泛存在的复等位变异,这可能会降低GWAS的检测功效。另外,植物常规育种是聚合优异等位变异的遗传操作过程^[15],获得每个QTL的复等位变异及其效应估计是分子标记辅助选择的必要前提,因此,利用仅有2个等位变异的SNP标记一定程度限制了GWAS在育种中的应用。

常用的GWAS方法,例如混合线性模型（mixed linear model,MLM）方法^[16],一般基于单位点模型检测QTL,即每个标记位点与表型的相关性测验彼此独立进行。然而实际上数量性状受大量的QTL控制,单位点模型中位点效应和贡献率的估计必然受到相邻QTL的影响,进而使得GWAS受到干扰。单位点效应（贡献）过高估计可能导致QTL总体表型变异解释率超过性状遗传率或甚至超过100%^[7]。为了控制这种膨胀,统计学家提出在个别位点测验的基础上进行全试验的总体测验,例如Bonferroni方法将测验阈值设为显著水平除以标记数目。鉴于GWAS涉及全基因组高密度SNP分子标记,要使全试验显著阈值保持常规水准（P=0.05）,Bonferroni方法就必然对每一个标记的阈值设置很高（-lgP值很大）。尽管严格的测验阈值有效降低了全试验错误率,但同时也导致了较高的假阴性,以至于GWAS往往仅能检测到少数QTL,只能解释遗传变异的很小部分,不能充分检出性状的全基因组QTL。

RTM-GWAS是新建立的一种可以全面检测自然群体和双（多）亲衍生群体中具有不同复等位变异QTL体系的关联分析方法。该方法采用了复等位变异标记和多位点模型作为解决上述两大问题的关键,形成RTM-GWAS的两大主要特点,目的在于全面解析群体数量性状QTL-等位变异的遗传构成^[1,17]。首先,RTM-GWAS通过构建具有复等位变异的SNP连锁不平衡区段（SNP linkage disequilibrium block,SNPLDB）标记以检测自然群体中的复等位变异。SNPLDB标记复等位变异丰富,可以拟合QTL上多个等位变异,比仅有2个等位变异的SNP标记更符合自然群体特征,从而提高了检测功效。基于SNPLDB标记的连锁不平衡衰减距离比SNP标记更短,因此,还可能提高检测精度。其次,RTM-GWAS通过多位点复等位变异模型以检测全基因组QTL及其复等位变异,以遗传率值作为检出QTL总贡献率的上限,使假阳性事件得到合理的控制。为提高QTL检测效率,RTM-GWAS还通过两阶段分析策略以降低多位点模型运算量,并利用基于SNPLDB标记的遗传相似系矩阵控制群体结构偏差导致的假阳性。鉴于多位点模型的显著性测验代表了全模型测验,不需要再做全试验多重测验,无需对显著水平进行矫正,因而同样采用常规显著水平（例如0.01或0.05）,实际的QTL检出数大大提高,一定程度上避免了由于过严格显著水平导致的假阴性问题。此外,RTM-GWAS还与表型鉴定试验设计紧密结合,可直接用试验设计原始数据（包括环境和区组）进行分析,因而RTM-GWAS将试验设计严格的误差控制和关联分析紧密结合,降低了试验误差,提高了QTL检出能力。详细说明请参考文献^[1-9,17]。

2 RTM-GWAS方法的常见质疑与辩解

RTM-GWAS方法及其应用文章发表过程中,审稿人对上述方法、原理以及对复等位标记和多位点模型的2个主要特点一般并无异议,但有怀疑与质疑。质疑最多的问题,主要有两方面：一是RTM-GWAS检测到的QTL数量较多,大大多于传统的单位点MLM模型所检出的QTL数目,怀疑多出来的QTL是假阳性所致;二是采用常规显著水准,阈值太低,不适于关联分析。

2.1 关于检测的QTL数量多,怀疑假阳性高的辩解

与其他常用GWAS方法相比,RTM-GWAS方法通常能检测更多的QTL,这符合设计RTM-GWAS方法的初衷。该方法的目的是要将遗传率所反映的QTL遗传变异尽可能多的挖掘出来,但总贡献率不应超过全试验的遗传率值。这是通过多位点模型分析中检测模型的显著性实现的。只要模型的贡献率不超过遗传率值,所有检测到的QTL都应是合理的,非假阳性的。另一方面,如上所述,RTM-GWAS与精细试验设计紧密结合,降低了误差,提高了检出功效,这是其他GWAS方法没有关注到的。此外,大豆基因组有4.5万—5.0万个基因,数量性状属于复杂性状,其遗传构成是一套基因网络体系,涉及大量效应不等相互影响的遗传位点。因此,检测的QTL数量多并无错误,这正好从侧面反应了数量性状遗传调控网络体系的复杂性。实际试验中,QTL总贡献率通常还达不到性状遗传率值,说明还有一部分QTL未被检测出来,改进试验精确度,增加标记密度,还有可能多挖掘出一批小贡献率的QTL。

2.2 关于所用显著水平不够严格的辩解

为了平衡假设测验中的假阳性和假阴性,统计学上普遍采用0.01或0.05作为显著水平,更严格的显着性水平可用于满足特定目的,这取决于重要性和成本。因此,RTM-GWAS也使用常规的显著水平0.01或0.05测验模型的显著性,控制全试验错误率。如前所述,由于RTM-GWAS基于多位点模型,模型的显著性本身就说明了模型的合理性和模型中所包括QTL的合理性,并不需要进行额外的多重测验校正。这与单位点模型不同,单位点模型的显著性仅说明该位点是显著的合理的,并不说明全部检测到的位点整体的显著性和合理性,因而要加做全试验所检测到位点整体的显著性测验,例如Bonferroni校正。简言之,多位点模型下,模型的显著性已包含了全部入选的QTL,不需另作矫正,反之,单位点模型下,入选单个位点的全体须经整体的显著性测验,要另做多重测验或Bonferroni校正。另外,RTM-GWAS同时还给出每个检出QTL的统计测验P值,研究者还可以选择特定阈值进一步筛选QTL,而无需重新计算。这是因为RTM-GWAS检测的QTL以P值从小到大依次排列,P值较小的往往贡献率越大。如果实际研究有更严格的要求,例如克隆候选基因费时费事,可以使用严格的阈值来筛选相对更重要的位点。

3 RTM-GWAS的应用前景

以往研究者常将GWAS方法用于寻找个别基因,因而并不注重性状遗传体系中全部QTL的检出。如上所述,建立RTM-GWAS方法着眼于解析自然群体或遗传群体的遗传体系或QTL-等位变异（QTL-allele）矩阵。该方法自建立以来已做了多方面的应用尝试,现归纳于后。更多的应用还有待今后在使用中发展。

3.1 遗传体系解析与重要基因克隆

对多个大豆数量性状的遗传解析显示,与以往GWAS方法相比,RTM-GWAS能检测到较多的QTL和相应的等位变异,QTL总贡献率也更接近性状的遗传率值,为研究数量性状完整的遗传体系提供了途径。RTM-GWAS除应用在单环下解析群体QTL-等位变异体系外,还可对多环境试验表型数据做基因与环境互作效应的解析。检测到的QTL可以有主效应有互作效应,有主效应无互作效应,无主效应有互作效应等不同类型;分析的结果可以归纳成主效QTL-等位变异和互作效应QTL-等位变异2个矩阵,为等位变异环境效应的研究提供了通路。另外,RTM-GWAS分析结果中QTL按概率值（显著程度）由小到大或贡献率由大到小依次检出,因此,除可以考察整个遗传体系外,还可以按各个QTL的重要程度分别进行研究,包括检出重要位点做基因克隆研究等。

3.2 双（多）亲杂交衍生群体遗传解析

RTM-GWAS特别适合于双（多）亲杂交后代群体,例如重组自交系群体和巢式关联作图群体。与自然群体不同,由于双亲群体遗传构成规则,群体偏差干扰小,检测功效更高。对大豆重组自交系群体的分析显示,RTM-GWAS除了能检测到传统复合区间作图法检测的QTL,还能检测更多的QTL,解释更多的表型变异。另外,对4个重组自交系群体组成的大豆巢式关联作图群体的分析显示,RTM-GWAS可以检测到复等位变异数目不等的QTL,每个QTL包含2—5个等位变异。以往巢式关联作图群体分析方法尽管将多个重组自交系群体联合分析,但仍将重组自交系群体相互独立处理^[18],因此,对多个亲本的复等位变异估计不确切。而RTM-GWAS视多个重组自交系群体为一个整群体,利用SNPLDB标记可以估计QTL上多个亲本间不同的等位变异,更符合群体遗传特征。

3.3 群体遗传分化与进化

RTM-GWAS可以估计出所有QTL上每个等位变异的效应,据此可建立数量性状的QTL-等位变异（QTL-allele）矩阵,即群体内每个材料在每个QTL上的效应矩阵。QTL-allele矩阵包括了性状在群体中的全部遗传组成,不仅能用于候选基因发掘,还特别适合于群体遗传分化和进化分析。例如,在对中国东北地区不同熟期组大豆主茎节数的遗传研究中（Crop Science即将发表）,大豆主茎节数从晚熟组（MGI,MGII）的17.89个减少到早熟组（MG0,MG00,MG000）的13.11个。在东北大豆种质群体中,RTM-GWAS共检测到76个主茎节数QTL,包括183个等位变异,共解释了65.63%的表型变异。在晚熟组到早熟组的进化过程中,有28.42%的等位变异产生变化,其中新生等位变异占6.56%,淘汰等位变异占21.86%,而71.58%的等位变异直接从晚熟组传递到早熟组。说明东北大豆主茎节数进化过程中,遗传是首要动力,其次是淘汰或选择（淘汰正效等位变异）,第三是新生或突变（新生负效等位变异）,最后通过所剩QTL等位变异间的遗传重组使群体产生遗传分化和进化。

3.4 QTL-allele矩阵应用于设计育种

亲本组配和后代选择是常规育种的2个主要步骤,对于复杂性状的遗传改良,背景选择和前景选择同等重要,仅通过少数几个主效位点的重组选择很可能无法创造出突破性新品种。由RTM-GWAS建立的QTL-allele矩阵为亲本组配和后代选择提供了遗传依据。基于QTL-allele矩阵可以对所有亲本组合的后代纯合群体进行预测,从而筛选最优亲本组合。因此,基于QTL-allele矩阵的选择是对目标性状位点进行的直接选择,更符合实际育种需求。同时,根据QTL-allele矩阵还可以设计最佳基因型（各位点最佳等位变异的组合）,根据相应标记对后代做标记辅助选择。这与MEUWISSEN的全基因组选择方法（genome-wide selection）^[19]有本质不同。全基因组选择首先基于参考子群体建立分子标记与多个目标性状表型的综合线性关系,这种关系对每个性状来说是黑箱关系。然后利用个体的全基因组分子标记信息预测候选个体的综合育种值,凭综合育种值对个体做选择。鉴于作物育种通常涉及许多组合,一个组合涉及上千后代个体,采用全基因组选择时大量全基因组分子标记数据花费高昂,而利用性状的QTL-allele矩阵只涉及目标性状的标记,即便有多个性状,涉及的标记总数也只是全基因组标记的极小部分,因而QTL-allele矩阵除可用以进行优化组合设计外,对组合后代个体也可能是一种有效的标记辅助设计和选择的途径,不过还有待于实践的检验。

参考文献原文顺序
文献年度倒序
文中引用次数倒序
被引期刊影响因子

[1]

, MENG

, ZHAO

, XING

, YANG

, LI

, GUAN

, LU

, WANG

, XIA

, YANG

, GAI

. An innovative procedure of genome-wide association analysis fits studies on germplasm population and plant breeding
Theoretical and Applied Genetics, 2017,130(11):2327-2343.

DOI:10.1007/s00122-017-2962-9 URLPMID:28828506 [本文引用: 3]

The innovative RTM-GWAS procedure provides a relatively thorough detection of QTL and their multiple alleles for germplasm population characterization, gene network identification, and genomic selection strategy innovation in plant breeding. The previous genome-wide association studies (GWAS) have been concentrated on finding a handful of major quantitative trait loci (QTL), but plant breeders are interested in revealing the whole-genome QTL-allele constitution in breeding materials/germplasm (in which tremendous historical allelic variation has been accumulated) for genome-wide improvement. To match this requirement, two innovations were suggested for GWAS: first grouping tightly linked sequential SNPs into linkage disequilibrium blocks (SNPLDBs) to form markers with multi-allelic haplotypes, and second utilizing two-stage association analysis for QTL identification, where the markers were preselected by single-locus model followed by multi-locus multi-allele model stepwise regression. Our proposed GWAS procedure is characterized as a novel restricted two-stage multi-locus multi-allele GWAS (RTM-GWAS, https://github.com/njau-sri/rtm-gwas ). The Chinese soybean germplasm population (CSGP) composed of 1024 accessions with 36,952 SNPLDBs (generated from 145,558 SNPs, with reduced linkage disequilibrium decay distance) was used to demonstrate the power and efficiency of RTM-GWAS. Using the CSGP marker information, simulation studies demonstrated that RTM-GWAS achieved the highest QTL detection power and efficiency compared with the previous procedures, especially under large sample size and high trait heritability conditions. A relatively thorough detection of QTL with their multiple alleles was achieved by RTM-GWAS compared with the linear mixed model method on 100-seed weight in CSGP. A QTL-allele matrix (402 alleles of 139 QTL?×?1024 accessions) was established as a compact form of the population genetic constitution. The 100-seed weight QTL-allele matrix was used for genetic characterization, candidate gene prediction, and genomic selection for optimal crosses in the germplasm population.

[2]

ZHANG

, HE

, WANG

, XING

, ZHAO

, LI

, YANG

, PALMER R

, ZHAO

, GAI

. Establishment of a 100-seed weight quantitative trait locus-allele matrix of the germplasm population for optimal recombination design in soybean breeding programmes
Journal of Experimental Botany, 2015,66(20):6311-6325.

DOI:10.1093/jxb/erv342 URLPMID:26163701 [本文引用: 1]

A representative sample comprising 366 accessions from the Chinese soybean landrace population (CSLRP) was tested under four growth environments for determination of the whole-genome quantitative trait loci (QTLs) system of the 100-seed weight trait (ranging from 4.59g to 40.35g) through genome-wide association study (GWAS). A total of 116 769 single nucleotide polymorphisms (SNPs) were identified and organized into 29 121 SNP linkage disequilibrium blocks (SNPLDBs) to fit the property of multiple alleles/haplotypes per locus in germplasm. An innovative two-stage GWAS was conducted using a single locus model for shrinking the marker number followed by a multiple loci model utilizing a stepwise regression for the whole-genome QTL identification. In total, 98.45% of the phenotypic variance (PV) was accounted for by four large-contribution major QTLs (36.33%), 51 small-contribution major QTLs (43.24%), and a number of unmapped minor QTLs (18.88%), with the QTL×environment variance representing only 1.01% of the PV. The allele numbers of each QTL ranged from two to 10. A total of 263 alleles along with the respective allele effects were estimated and organized into a 263×366 matrix, giving the compact genetic constitution of the CSLRP. Differentiations among the ecoregion matrices were found. No landrace had alleles which were all positive or all negative, indicating a hidden potential for recombination. The optimal crosses within and among ecoregions were predicted, and showed great transgressive potential. From the QTL system, 39 candidate genes were annotated, of which 26 were involved with the gene ontology categories of biological process, cellular component, and molecular function, indicating that diverse genes are involved in directing the 100-seed weight.

[3]

MENG

, HE

, ZHAO

, XING

, LI

, YANG

, LU

, WANG

, GAI

. Detecting the QTL-allele system of seed isoflavone content in Chinese soybean landrace population for optimal cross design and gene system exploration
Theoretical and Applied Genetics, 2016,129(8):1557-1576.

DOI:10.1007/s00122-016-2724-0 URLPMID:27189002 [本文引用: 1]

Utilizing an innovative GWAS in CSLRP, 44 QTL 199 alleles with 72.2?% contribution to SIFC variation were detected and organized into a QTL-allele matrix for cross design and gene annotation. The seed isoflavone content (SIFC) of soybeans is of great importance to health care. The Chinese soybean landrace population (CSLRP) as a genetic reservoir was studied for its whole-genome quantitative trait loci (QTL) system of the SIFC using an innovative restricted two-stage multi-locus genome-wide association study procedure (RTM-GWAS). A sample of 366 landraces was tested under four environments and sequenced using RAD-seq (restriction-site-associated DNA sequencing) technique to obtain 116,769 single nucleotide polymorphisms (SNPs) then organized into 29,119 SNP linkage disequilibrium blocks (SNPLDBs) for GWAS. The detected 44 QTL 199 alleles on 16 chromosomes (explaining 72.2?% of the total phenotypic variation) with the allele effects (92 positive and 107 negative) of the CSLRP were organized into a QTL-allele matrix showing the SIFC population genetic structure. Additional differentiation among eco-regions due to the SIFC in addition to that of genome-wide markers was found. All accessions comprised both positive and negative alleles, implying a great potential for recombination within the population. The optimal crosses were predicted from the matrices, showing transgressive potentials in the CSLRP. From the detected QTL system, 55 candidate genes related to 11 biological processes were χ (2)-tested as an SIFC candidate gene system. The present study explored the genome-wide SIFC QTL/gene system with the innovative RTM-GWAS and found the potentials of the QTL-allele matrix in optimal cross design and population genetic and genomic studies, which may have provided a solution to match the breeding by design strategy at both QTL and gene levels in breeding programs.

[4]

, CAO

, HE

, ZHAO

, GAI

. Detecting the QTL-allele system conferring flowering date in a nested association mapping population of soybean using a novel procedure
Theoretical and Applied Genetics, 2017,130(11):2297-2314.

DOI:10.1007/s00122-017-2960-y URLPMID:28799029 [本文引用: 1]

The RTM-GWAS was chosen among five procedures to identify DTF QTL-allele constitution in a soybean NAM population; 139 QTLs with 496 alleles accounting for 81.7% of phenotypic variance were detected. Flowering date (days to flowering, DTF) is an ecological trait in soybean, closely related to its ability to adapt to areas. A nested association mapping (NAM) population consisting of four RIL populations (LM, ZM, MT and MW with M8206 as their common parent) was established and tested for their DTF under five environments. Using restriction-site-associated DNA sequencing the population was genotyped with SNP markers. The restricted two-stage multi-locus (RTM) genome-wide association study (GWAS) (RTM-GWAS) with SNP linkage disequilibrium block (SNPLDB) as multi-allele genomic markers performed the best among the five mapping procedures with software publicly available. It identified the greatest number of quantitative trait loci (QTLs) (139) and alleles (496) on 20 chromosomes covering almost all of the QTLs detected by four other mapping procedures. The RTM-GWAS provided the detected QTLs with highest genetic contribution but without overflowing and missing heritability problems (81.7% genetic contribution vs. heritability of?97.6%), while SNPLDB markers matched the NAM population property of multiple alleles per locus. The 139 QTLs with 496 alleles were organized into a QTL-allele matrix, showing the corresponding DTF genetic architecture of the five parents and the NAM population. All lines and parents comprised both positive and negative alleles, implying a great potential of recombination for early and late DTF improvement. From the detected QTL-allele system, 126 candidate genes were annotated and χ ² tested as a DTF candidate gene system involving nine biological processes, indicating the trait a complex, involving several biological processes rather than only a handful of major genes.

[5]

KHAN M

, TONG

, WANG

, HE

, ZHAO

, GAI

. Analysis of QTL-allele system conferring drought tolerance at seedling stage in a nested association mapping population of soybean [Glycine max (L.) Merr.] using a novel GWAS procedure
Planta, 2018,248(4):947-962.

DOI:10.1007/s00425-018-2952-4 URLPMID:29980855 [本文引用: 1]

RTM-GWAS identified 111 DT QTLs, 262 alleles with high proportion of QEI and genetic variation accounting for 88.55-95.92% PV in NAM, from which QTL-allele matrices were established and candidate genes annotated. Drought tolerance (DT) is one of the major challenges for world soybean production. A nested association mapping (NAM) population with 403 lines comprising two recombinant inbred line (RIL) populations: M8206?×?TongShan and ZhengYang?×?M8206 was tested for DT using polyethylene-glycol (PEG) treatment under spring and summer environments. The population was sequenced using restriction-site-associated DNA sequencing (RAD-seq) filtered with minor allele frequency (MAF)?≥?0.01, 55,936 single nucleotide polymorphisms (SNPs) were obtained and organized into 6137 SNP linkage disequilibrium blocks (SNPLDBs). The restricted two-stage multi-locus genome-wide association studies (RTM-GWAS) identified 73 and 38 QTLs with 174 and 88 alleles contributed main effect 40.43 and 26.11% to phenotypic variance (PV) and QTL-environment interaction (QEI) effect 24.64 and 10.35% to PV for relative root length (RRL) and relative shoot length (RSL), respectively. The DT traits were characterized with high proportion of QEI variation (37.52-41.65%), plus genetic variation (46.90-58.40%) in a total of 88.55-95.92% PV. The identified QTLs-alleles were organized into main-effect and QEI-effect QTL-allele matrices, showing the genetic and QEI architecture of the three parents/NAM population. From the matrices, the possible best genotype was predicted to have a weighted average value over two indicators (WAV) of 1.873, while the top ten optimal crosses among RILs with 95th percentile WAV 1.098-1.132, transgressive over the parents (0.651-0.773) but much less than 1.873, implying further pyramiding potential. From the matrices, 134 candidate genes were annotated involved in nine biological processes. The present results provide a novel way for molecular breeding in QTL-allele-based genomic selection for optimal cross selection.

[6]

PAN

, HE

, ZHAO

, XING

, WANG

, YU

, CHEN

, GAI

. Efficient QTL detection of flowering date in a soybean RIL population using the novel restricted two-stage multi-locus GWAS procedure
Theoretical and Applied Genetics, 2018,131(12):2581-2599.

DOI:10.1007/s00122-018-3174-7 URLPMID:30167759 [本文引用: 1]

Eighty-six R1 QTLs accounting for 89.92% phenotypic variance in a soybean RIL population were identified using RTM-GWAS with SNPLDB marker which performed superior over CIM and MLM-GWAS with BIN/SNPLDB marker. A population (NJRIKY) composed of 427 recombinant inbred lines (RILs) derived from Kefeng-1?×?NN1138-2 (MGII?×?MGV, MG maturity group) was applied for detecting flowering date (R1) quantitative trait locus (QTL) system in soybean. From a low-depth re-sequencing (~?0.75?×), 576,874 SNPs were detected and organized into 4737 BINs (recombination breakpoint determinations) and 3683 SNP linkage disequilibrium blocks (SNPLDBs), respectively. Using the association mapping procedures &quot;Restricted Two-stage Multi-locus Genome-wide Association Study&quot; (RTM-GWAS), &quot;Mixed Linear Model Genome-wide Association Study&quot; (MLM-GWAS) and the linkage mapping procedure &quot;Composite Interval Mapping&quot; (CIM), 67, 36 and 10 BIN-QTLs and 86, 14 and 23 SNPLDB-QTLs were detected with their phenotypic variance explained (PVE) 88.70-89.92% (within heritability 98.2%), 146.41-353.62% (overflowing) and 88.29-172.34% (overflowing), respectively. The RTM-GWAS with SNPLDBs which showed to be more efficient and reasonable than the others was used to identify the R1 QTL system in NJRIKY. The detected 86 SNPLDB-QTLs with their PVE from 0.02 to 30.66% in a total of 89.92% covered 51 out of 104 R1 QTLs in 18 crosses in SoyBase and 26 out of 139 QTLs in a nested association mapping population, while the rest 29 QTLs were novel ones. From the QTL system, 52 candidate genes were annotated, including the verified gene E1, E2, E9 and J, and grouped into 3 categories of biological processes, among which 24 genes were enriched into three protein-protein interaction networks, suggesting gene networks working together. Since NJRIKY involves only MGII and MGV, the QTL/gene system among MG000-MGX should be explored further.

[7]

ZHANG

, HE

, WANG

, MENG

, XING

, LI

, YANG

, ZHAO

, GAI

. Detecting the QTL-allele system of seed oil traits using multi-locus genome-wide association analysis for population characterization and optimal cross prediction in soybean
Frontiers in Plant Science, 2018,9:1793.

DOI:10.3389/fpls.2018.01793 URLPMID:30568668 [本文引用: 2]

Soybean is one of the world's major vegetative oil sources, while oleic acid and linolenic acid content are the major quality traits of soybean oil. The restricted two-stage multi-locus genome-wide association analysis (RTM-GWAS), characterized with error and false-positive control, has provided a potential approach for a relatively thorough detection of whole-genome QTL-alleles. The Chinese soybean landrace population (CSLRP) composed of 366 accessions was tested under four environments to identify the QTL-allele constitution of seed oil, oleic acid and linolenic acid content (SOC, OAC, and LAC). Using RTM-GWAS with 29,119 SNPLDBs (SNP linkage disequilibrium blocks) as genomic markers, 50, 98, and 50 QTLs with 136, 283, and 154 alleles (2-9 per locus) were detected, with their contribution 82.52, 90.31, and 83.86% to phenotypic variance, corresponding to their heritability 91.29, 90.97, and 90.24% for SOC, OAC, and LAC, respectively. The RTM-GWAS was shown to be more powerful and efficient than previous single-locus model GWAS procedures. For each trait, the detected QTL-alleles were organized into a QTL-allele matrix as the population genetic constitution. From which the genetic differentiation among 6 eco-populations was characterized as significant allele frequency differentiation on 28, 56, and 30 loci for the three traits, respectively. The QTL-allele matrices were also used for genomic selection for optimal crosses, which predicted transgressive potential up to 24.76, 40.30, and 2.37% for the respective traits, respectively. From the detected major QTLs, 38, 27, and 25 candidate genes were annotated for the respective traits, and two common QTL covering eight genes were identified for further study.

[8]

ZHANG Y

, HE J

, MENG

, LIU M

, XING G

, LI

, YANG S

, YANG J

, ZHAO T

, GAI J

. Identifying QTL-allele system of seed protein content in Chinese soybean landraces for population differentiation studies and optimal cross predictions
Euphytica, 2018,214(9):157.

[本文引用: 1]

[9]

KHAN M

, TONG

, WANG

, HE

, ZHAO

, GAI

, WILLENBORG

. Using the RTM-GWAS procedure to detect the drought tolerance QTL-allele system at the seedling stage under sand culture in a half-sib population of soybean [Glycine max (L.) Merr.]
Canadian Journal of Plant Science, 2019,99(6):801-814.

DOI:10.1139/cjps-2018-0309 URL [本文引用: 2]

[10]

贺建波, 刘方东, 王吴彬, 邢光南, 管荣展, 盖钧镒 . 限制性两阶段多位点全基因组关联分析方法的特点与计算程序
中国农业科学, 2020,53(9):1699-1703.

[本文引用: 1]

HE J

, LIU F

, WANG W

, XING G

, GUAN R

, GAI

. Restricted two-stage multi-locus genome-wide association analysis and its applications to genetic and breeding studies
Scientia Agricultura Sinica, 2020,53(9):1699-1703. (in Chinese)

[本文引用: 1]

[11]

郝晓帅, 傅蒙蒙, 刘再东, 贺建波, 王燕平, 任海祥, 王德亮, 杨兴勇, 程延喜, 杜维广, 盖钧镒 . 东北大豆种质群体百粒重QTL-等位变异的全基因组解析
中国农业科学, 2020,53(9):1704-1729.

[本文引用: 1]

HAO X

, FU M

, LIU Z

, HE J

, WANG Y

, REN H

, WANG D

, YANG X

, CHENG Y

, DU W

, GAI J

. Genome-wide QTL-allele dissection of 100-seed weight in the Northeast China soybean germplasm population Scientia
Agricultura Sinica, 2020,53(9):1704-1729. (in Chinese)

[本文引用: 1]

[12]

潘丽媛, 贺建波, 赵晋铭, 王吴彬, 邢光南, 喻德跃, 张小燕, 李春燕, 陈受宜, 盖钧镒 . RTM-GWAS方法应用于大豆RIL群体百粒重QTL检测的功效
中国农业科学, 2020,53(9):1730-1742.

[本文引用: 1]

PAN L

, HE J

, ZHAO J

, WANG W

, XING G

, YU D

, ZHANG X

, LI C

, CHEN S

, GAI J

. Detection power of RTM-GWAS applied to 100-seed weight QTL identification in a recombinant inbred lines population of soybean
Scientia Agricultura Sinica, 2020,53(9):1730-1742. (in Chinese)

[本文引用: 1]

[13]

李曙光, 曹永策, 贺建波, 王吴彬, 邢光南, 杨加银, 赵团结, 盖钧镒 . 大豆巢式关联作图群体蛋白质含量的遗传解析
中国农业科学, 2020,53(9):1743-1755.

[本文引用: 1]

LI S

, CAO Y

, HE J

, WANG W

, XING G

, YANG J

, ZHAO T

, GAI J

. Genetic dissection of protein content in a nested association mapping population of soybean
Scientia Agricultura Sinica, 2020,53(9):1743-1755. (in Chinese)

[本文引用: 1]

[14]

刘再东, 孟珊, 贺建波, 邢光南, 王吴彬, 赵团结, 盖钧镒 . 大豆重组自交系群体异黄酮含量QTL连锁定位与关联定位的比较研究
中国农业科学, 2020,53(9):1756-1772.

[本文引用: 1]

LIU Z

, MENG

, HE J

, XING G

, WANG W

, ZHAO T

, GAI J

. A comparative study on linkage and association QTL mapping for seed isoflavone contents in a recombinant inbred line population of soybean
Scientia Agricultura Sinica, 2020,53(9):1756-1772. (in Chinese)

[本文引用: 1]

[15]

GAI J

, CHEN

, ZHANG

, ZHAO T

, XING G

, XING

. Genome-wide genetic dissection of germplasm resources and implications for breeding by design in soybean
Breeding Science, 2012,61(5):495-510.

DOI:10.1270/jsbbs.61.495 URLPMID:23136489 [本文引用: 1]

&quot;Breeding by Design&quot; as a concept described by Peleman and van der Voort aims to bring together superior alleles for all genes of agronomic importance from potential genetic resources. This might be achievable through high-resolution allele detection based on precise QTL (quantitative trait locus/loci) mapping of potential parental resources. The present paper reviews the works at the Chinese National Center for Soybean Improvement (NCSI) on exploration of QTL and their superior alleles of agronomic traits for genetic dissection of germplasm resources in soybeans towards practicing &quot;Breeding by Design&quot;. Among the major germplasm resources, i.e. released commercial cultivar (RC), farmers' landrace (LR) and annual wild soybean accession (WS), the RC was recognized as the primary potential adapted parental sources, with a great number of new alleles (45.9%) having emerged and accumulated during the 90 years' scientific breeding processes. A mapping strategy, i.e. a full model procedure (including additive (A), epistasis (AA), A × environment (E) and AA × E effects), scanning with QTLNetwork2.0 and followed by verification with other procedures, was suggested and used for the experimental data when the underlying genetic model was usually unknown. In total, 110 data sets of 81 agronomically important traits were analyzed for their QTL, with 14.5% of the data sets showing major QTL (contribution rate more than 10.0% for each QTL), 55.5% showing a few major QTL but more small QTL, and 30.0% having only small QTL. In addition to the detected QTL, the collective unmapped minor QTL sometimes accounted for more than 50% of the genetic variation in a number of traits. Integrated with linkage mapping, association mappings were conducted on germplasm populations and validated to be able to provide complete information on multiple QTL and their multiple alleles. Accordingly, the QTL and their alleles of agronomic traits for large samples of RC, LR and WS were identified and then the QTL-allele matrices were established. Based on which the parental materials can be chosen for complementary recombination among loci and alleles to make the crossing plans genetically optimized. This approach has provided a way towards breeding by design, but the accuracy will depend on the precision of the loci and allele matrices.

[16]

SUL J

, MARTIN L

, ESKIN

. Population structure in genetic studies: Confounding factors and mixed models
PLoS Genetics, 2018,14(12):e1007309.

DOI:10.1371/journal.pgen.1007309 URLPMID:30589851 [本文引用: 1]

A genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to accurately test for association while correcting for population structure is a computational and statistical challenge. Using laboratory mouse strains as an example, our review characterizes the problem of population structure in association studies and describes how it can cause false positive associations. We then motivate mixed models in the context of unmodeled factors.

[17]

贺建波, 刘方东, 邢光南, 王吴彬, 赵团结, 管荣展, 盖钧镒 . 限制性两阶段多位点全基因组关联分析方法的特点与计算程序
作物学报, 2018,44(9):1274-1289.

DOI:10.3724/SP.J.1006.2018.01274 URL [本文引用: 2]

Genome-wide association studies (GWAS) have been widely used for genetic dissection of quantitative trait loci (QTL), and the previous GWAS procedures were concentrated on finding a handful of major loci, while the plant breeders are more likely interested in exploring the whole QTL system for both forward selection and background control. We proposed the restricted two-stage multi-locus genome-wide association analysis (RTM-GWAS, https://github.com/njau-sri/rtm-gwas/) for a relatively thorough detection of QTL and their multiple alleles. Firstly, RTM-GWAS groups the tightly linked sequential SNPs into linkage disequilibrium blocks (SNPLDBs) to form genomic markers with multiple haplotypes as alleles. Secondly, it utilizes two-stage association analysis based on a multi-locus multi-allele model to save computer space for focusing on genome-wide QTL identification along with their multiple alleles. Compared with the previous GWAS methods, RTM-GWAS takes the trait heritability as the upper limit of detected genetic contribution, which can avoid a large amount of false positives for a precise detection of the QTL system of the trait. The QTL-allele matrix as a compact form of the population genetic constitution can be used to design optimal genotypes, to predict optimal crosses in plant breeding, and to study the genetic properties of the population as well as the novel and newly emerged alleles. In the present study, we first introduced the function and usage of the RTM-GWAS analytical programs, and then used the experimental data from a research program on soybean to illustrate the application details of the RTM-GWAS.

HE J

, LIU F

, XING G

, WANG W

, ZHAO T

, GUAN R

, GAI J

. Characterization and analytical programs of the restricted two-stage multi-locus genome-wide association analysis
Acta Agronomica Sinica, 2018,44(9):1274-1289. (in Chinese)

DOI:10.3724/SP.J.1006.2018.01274 URL [本文引用: 2]

BUCKLER E

, HOLLAND J

, BRADBURY P

, et al. The genetic architecture of maize flowering time
Science, 2009,325(5941):714-718.

DOI:10.1126/science.1174276 URLPMID:19661422 [本文引用: 1]

Flowering time is a complex trait that controls adaptation of plants to their local environment in the outcrossing species Zea mays (maize). We dissected variation for flowering time with a set of 5000 recombinant inbred lines (maize Nested Association Mapping population, NAM). Nearly a million plants were assayed in eight environments but showed no evidence for any single large-effect quantitative trait loci (QTLs). Instead, we identified evidence for numerous small-effect QTLs shared among families; however, allelic effects differ across founder lines. We identified no individual QTLs at which allelic effects are determined by geographic origin or large effects for epistasis or environmental interactions. Thus, a simple additive model accurately predicts flowering time for maize, in contrast to the genetic architecture observed in the selfing plant species rice and Arabidopsis.

[19]

MEUWISSEN T

, HAYES B

, GODDARD M

. Prediction of total genetic value using genome-wide dense marker maps
Genetics, 2001,157(4):1819-1829.

URLPMID:11290733 [本文引用: 1]

Recent advances in molecular genetic techniques will make dense marker maps available and genotyping many individuals for these markers feasible. Here we attempted to estimate the effects of approximately 50,000 marker haplotypes simultaneously from a limited number of phenotypic records. A genome of 1000 cM was simulated with a marker spacing of 1 cM. The markers surrounding every 1-cM region were combined into marker haplotypes. Due to finite population size N(e) = 100, the marker haplotypes were in linkage disequilibrium with the QTL located between the markers. Using least squares, all haplotype effects could not be estimated simultaneously. When only the biggest effects were included, they were overestimated and the accuracy of predicting genetic values of the offspring of the recorded animals was only 0.32. Best linear unbiased prediction of haplotype effects assumed equal variances associated to each 1-cM chromosomal segment, which yielded an accuracy of 0.73, although this assumption was far from true. Bayesian methods that assumed a prior distribution of the variance associated with each chromosome segment increased this accuracy to 0.85, even when the prior was not correct. It was concluded that selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals and plants, especially if combined with reproductive techniques to shorten the generation interval.