删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

Machine Learning Models for Genetic Risk Assessment of Infants with Non-syndromic Orofacial Cleft

本站小编 Free考研考试/2022-01-03

The isolated type of orofacial cleft, termed non-syndromic cleft lip with or without cleft palate (NSCL/P), is the second most common birth defect in China, with Asians having the highest incidence in the world. NSCL/P involves multiple genes and complex interactions between genetic and environmental factors, imposing difficulty for the genetic assessment of the unborn fetus carrying multiple NSCL/P-susceptible variants. Although genome-wide association studies (GWAS) have uncovered dozens of single nucleotide polymorphism (SNP) loci in different ethnic populations, the genetic diagnostic effectiveness of these SNPs requires further experimental validation in Chinese populations before a diagnostic panel or a predictive model covering multiple SNPs can be built. In this study, we collected blood samples from control and NSCL/P infants in Han and Uyghur Chinese populations to validate the diagnostic effectiveness of 43 candidate SNPs previously detected using GWAS. We then built predictive models with the validated SNPs using different machine learning algorithms and evaluated their prediction performance. Our results showed that logistic regression had the best performance for risk assessment according to the area under curve. Notably, defective variants in MTHFR and RBP4, two genes involved in folic acid and vitamin A biosynthesis, were found to have high contributions to NSCL/P incidence based on feature importance evaluation with logistic regression. This is consistent with the notion that folic acid and vitamin A are both essential nutritional supplements for pregnant women to reduce the risk of conceiving an NSCL/P baby. Moreover, we observed a lower predictive power in Uyghur than in Han cases, likely due to differences in genetic background between these two ethnic populations. Thus, our study highlights the urgency to generate the HapMap for Uyghur population and perform resequencing-based screening of Uyghur-specific NSCL/P markers.
唇腭裂是口腔颌面部最常见的出生缺陷之一,发病率因种族差异有所不同,以东亚人最高(1/500)。此外,我国的第二大少数民族——维吾尔族唇腭裂的发病率(1.96/1000)亦高于中国人的平均水平(1.42/1000)。唇腭裂病因复杂,既有遗传因素(如IRF6基因的变异)又有环境因素(如母亲营养状态、烟、酒精等)的作用,给唇腭裂的遗传风险评估造成较大困难。截至目前已经有多个全基因组关联分析(genome-wide association study, GWAS)发现了一些唇腭裂相关的单核苷酸多态性(single nucleotide polymorphism, SNP)位点, 但是每一单个位点对于唇腭裂的遗传贡献率尚不清楚。本研究中,我们收集了103例汉族患者、279例维族患者、504例汉族对照和205例维族对照。从截至2017年12月发表在高水平杂志上的6篇唇腭裂相关GWAS文章中共计筛选出43个唇腭裂相关的SNP位点,分别检测每位受试者这43个SNP位点的基因型,利用不同的机器学习算法分别在维族和汉族人群中构建唇腭裂发病风险预测模型,并比较各种算法的预测效力。我们发现:在七种算法中,logistic regression的预测效果最好,在汉族人群中受试者工作特征曲线的曲线下面积(area under the curve, AUC)可达0.90,但在维族人群中模型的预测效力则低于汉族,AUC值仅为0.64。通过在构建模型过程中逐步递增和逐步移除这43个SNP位点,我们进一步筛选出6个位点,利用这6个位点构建的模型对汉族人群唇腭裂的发病风险也能达到较好的预测效果,AUC值为0.87。在这6个SNP位点中,有4个与营养代谢相关,其中包括位于叶酸代谢相关基因MTHFR编码区的rs1801133 和rs1801131,以及维生素A转运相关蛋白RBP4基因非编码区的rs10882272。由此可见,通过机器学习方法利用较少的SNP位点构建模型,可对汉族人群唇腭裂发病风险达到较好的预测效果,该模型可能具有一定的临床应用前景,但仍需在更多的人群中进一步验证。此外,营养代谢相关基因的变异在唇腭裂的发生中可能起重要作用。我们猜测:对于备孕期及孕早期妇女,尤其是携带有相应缺陷基因者,针对其不同突变基因型,个体化补充相应剂量的叶酸或/和维生素A或许能降低胎儿患唇腭裂的风险。





PDF全文下载地址:

http://gpb.big.ac.cn/articles/download/668
相关话题/gen