As a newly-identified protein post-translational modification, malonylation is involved in a variety of biological functions. Recognizing malonylation sites in substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein malonylation. In this study, we constructed a deep learning (DL) network classifier based on long short-term memory (LSTM) with word embedding (LSTMWE) for the prediction of mammalian malonylation sites. LSTMWE performs better than traditional classifiers developed with common pre-defined feature encodings or a DL classifier based on LSTM with a one-hot vector. The performance of LSTMWE is sensitive to the size of the training set, but this limitation can be overcome by integration with a traditional machine learning (ML) classifier. Accordingly, an integrated approach called LEMP was developed, which includes LSTMWE and the random forest classifier with a novel encoding of enhanced amino acid content. LEMP performs not only better than the individual classifiers but also superior to the currently-available malonylation predictors. Additionally, it demonstrates a promising performance with a low false positive rate, which is highly useful in the prediction application. Overall, LEMP is a useful tool for easily identifying malonylation sites with high confidence. LEMP is available at http://www.bioinfogo.org/lemp.
蛋白质丙二酰化(Malonylation)是一种近年来新发现的蛋白质翻译后修饰,发生在蛋白质的赖氨酸(Lysine)的侧链上,它在多种代谢调控过程中都发挥着重要的作用。最近的研究表明,丙二酰化通过糖酵解途径来调控能量循环,在葡萄糖和脂肪酸代谢过程中起关键作用,与2 型糖尿病也密切相关。因此,在修饰底物上精确鉴定修饰位点是研究其分子机制的一个非常关键的步骤。在本研究中,我们首先采用长短期记忆(LSTM)模型算法构建了一个基于的深度学习的模型—LSTMWE,用于对哺乳动物中的丙二酰化修饰位点进行预测。测试结果表明,LSTMWE的性能对训练集数据量敏感,当训练数据集足够大的时候,LSTMWE的预测性相对于传统的基于预定义的特征编码的分类模型有着绝对优势,但当数据量较小的时候,这种优势消失。为了使我们的预测框架能够应用在不同量级的数据集中,我们通过整合LSTMWE和一种使用增强氨基酸含量的新型编码的随机森林分类模型构建了一个集成的模型—LEMP。LEMP不仅在不同大小的数据集中都具有较稳定的预测性能,而且优于目前已经发表的丙二酰化预测方法。总的来说,LEMP是一个对丙二酰化修饰进行预测的有效工具。目前,LEMP可以通过http://www.bioinfogo.org/lemp访问。
PDF全文下载地址:
http://gpb.big.ac.cn/articles/download/680
删除或更新信息,请邮件至freekaoyan#163.com(#换成@)
Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation
本站小编 Free考研考试/2022-01-03
相关话题/gen
The Genome of Opium Poppy Reveals Evolutionary History of Morphinan Pathway
PDF全文下载地址:/articles/download/681 ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03Polyphyly in 16S rRNA-based LVTree Versus Monophyly in Whole-genome-based CVTree
Wereportanimportantbutlong-overlookedmanifestationoflow-resolutionpowerof16SrRNAsequenceanalysisatthespecieslevel,namely,in16SrRNA-basedphylogenetictr ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03Machine Learning Models for Genetic Risk Assessment of Infants with Non-syndromic Orofacial Cleft
Theisolatedtypeoforofacialcleft,termednon-syndromiccleftlipwithorwithoutcleftpalate(NSCL/P),isthesecondmostcommonbirthdefectinChina,withAsianshavingth ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03RGAAT: A Reference-based Genome Assembly and Annotation Tool for New Genomes and Upgrade of Known Ge
Therapiddevelopmentofhigh-throughputsequencingtechnologieshasledtoadramaticdecreaseinthemoneyandtimerequiredfordenovogenomesequencingorgenomeresequenc ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03HeteroMeth: A Database of Cell-to-cell Heterogeneity in DNA Methylation
DNAmethylationisanimportantepigeneticmarkthatplaysavitalroleingeneexpressionandcelldifferentiation.TheaverageDNAmethylationlevelamongagroupofcellshasb ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03GAAD: A Gene and Autoimmiune Disease Association Database
Autoimmunediseases(ADs)arisefromanabnormalimmuneresponseofthebodyagainstsubstancesandtissuesnormallypresentinthebody.MorethanahundredofADshavebeendesc ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03CCGD-ESCC: A Comprehensive Database for Genetic Variants Associated with Esophageal Squamous Cell Ca
Esophagealsquamous-cellcarcinoma(ESCC)isoneofthemostlethalmalignanciesintheworldandoccursatparticularlyhigherfrequencyinChina.Whileseveralgenome-widea ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03PlaD: A Transcriptomics Database for Plant Defense Responses to Pathogens, Providing New Insights in
High-throughputtranscriptomicstechnologieshavebeenwidelyusedtostudyplanttranscriptionalreprogrammingduringtheprocessofplantdefenseresponses,andalargeq ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03TSNAdb: A Database for Tumor-specific Neoantigens from Immunogenomics Data Analysis
Tumor-specificneoantigenshaveattractedmuchattentionsincetheycanbeusedasbiomarkerstopredicttherapeuticeffectsofimmunecheckpointblockadetherapyandaspote ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03Genome-wide MicroRNA Expression Profiles in COPD: Early Predictors for Cancer Development
Chronicobstructivepulmonarydisease(COPD)significantlyincreasestheriskofdevelopingcancer.Biomarkerstudiesfrequentlyfollowacase-controlset-upinwhichpati ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03