MHC-I epitope presentation prediction based on transfer learning
Weipeng Hu1,2,3, Youping Li2,3,4, Xiuqing Zhang,2,3,4 1. School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China 2. BGI-Shenzhen, Shenzhen 518083, China 3. BGI-GenoImmune, Wuhan 4300794, China 4. BGI Education Center, University of Chinese Academy of Sciences, Shenzhen 518083, China
Supported by the National Natural Science Foundation of China Nos.(81702826) Supported by the National Natural Science Foundation of China Nos.(81772910) Science, Technology and Innovation Commission of Shenzhen Municipality No. (JCYJ20170303151334808) and Shenzhen Municipal Government of China No.(20170731162715261)
作者简介 About authors 胡伟澎,硕士研究生,专业方向:基因组学E-mail:huweipeng@genomics.cn。
Abstract Accurate epitope presentation prediction is a key procedure in tumour immunotherapies based on neoantigen for targeting T cell specific epitopes. Epitopes identified by mass spectrometry (MS) is valuable to train an epitope presentation prediction model. In spite of the accelerating accumulation of MS data, the number of epitopes that match most of human leukocyte antigens (HLAs) is relatively small, which makes it difficult to build a reliable prediction model. Therefore, this research attempted to use the transfer learning method to train a model to learn common features among the mixed allele specific epitopes. Then based on this pre-trained model, we used the allele-specific epitopes to train the final epitope presentation prediction model, termed Pluto. The average 0.1% positive predictive value (PPV) of Pluto outperformed the prediction model without pretraining with a margin of 0.078 on the same validation dataset. When evaluating Pluto on external HLA eluted ligand datasets, Pluto achieved an averaged 0.1% PPV of 0.4255, which is better than the prediction model without pretraining (0.3824) and other popular methods, including MixMHCpred (0.3369), NetMHCpan4.0-EL (0.4000), NetMHCpan4.0-BA (0.3188) and MHCflurry (0.3002). Moreover, when it comes to the evaluation of predicting immunogenicity, Pluto can identify more neoantigens than other tools. Pluto is publicly available at https://github.com/weipenegHU/Pluto. Keywords:immunotherapy;neoantigen;epitope presentation;deep learning;transfer learning
PDF (526KB)元数据多维度评价相关文章导出EndNote|Ris|Bibtex收藏本文 本文引用格式 胡伟澎, 李佑平, 张秀清. 基于迁移学习的MHC-I型抗原表位呈递预测[J]. 遗传, 2019, 41(11): 1041-1049 doi:10.16288/j.yczz.19-155 Weipeng Hu, Youping Li, Xiuqing Zhang. MHC-I epitope presentation prediction based on transfer learning[J]. Hereditas(Beijing), 2019, 41(11): 1041-1049 doi:10.16288/j.yczz.19-155
预训练的模型由输入层、5层隐藏层和输出层组成(图1A),其中5层隐藏层包含的神经元数目分别为100、30、100、30和10,第一个和第三个隐藏层采用dropout(dropout rate=0.4)来控制模型的过拟合,各隐藏层均使用exponential linear unit (ELU)作为激活函数。本研究采用批次梯度下降的方法训练预训练模型,每个批次包含1024条多肽(阳性和阴性多肽各一半),总共迭代100次。采用5层交叉验证的方法来评估预训练模型的准确率。
Pluto的平均0.1% PPV要显著高于从头训练模型,MixMHCpred,NetMHCpan4.0-BA和MHCflurry。Pluto的平均0.1%PPV虽然没有显著高于NetMHCpan4.0-EL,但是在每个分型上的表现都要高于NetMHCpan4.0-EL。*代表P<0.05,**代表P<0.005 (paired t-test)。 Fig. 3Independent evaluation on external mass spectrometry data set
TranE, AhmadzadehM, LuYC, GrosA, TurcotteS, RobbinsPF, GartnerJJ, ZhengZ, LiYF, RayS, WunderlichJR, SomervilleRP, RosenbergSA . Immunogenicity of somatic mutations in human gastrointestinal cancers , 2015,350(6266):1387-1390. [本文引用: 1]
TranE, RobbinsPF, LuYC, PrickettTD, GartnerJJ, JiaL, PasettoA, ZhengZ, RayS, GrohEM, KrileyIR, RosenbergSA . T-Cell transfer therapy targeting mutant KRAS in cancer , 2016,375(23):2255-2262. [本文引用: 1]
ZacharakisN, ChinnasamyH, BlackM, XuH, LuYC, ZhengZ, PasettoA, LanghanM, SheltonT, PrickettT, GartnerJ, JiaL, Trebska-McgowanK, SomervilleRP, RobbinsPF, RosenbergSA, GoffSL, FeldmanSA . Immune recognition of somatic mutations leading to complete durable regression in metastatic breast cancer , 2018,24(6):724-730. [本文引用: 1]
StricklandKC, HowittBE, ShuklaSA, RodigS, RitterhouseLL, LiuJF, GarberJE, ChowdhuryD, WuCJ , D'andrea AD. Association and prognostic significance of BRCA1/2-mutation status with neoantigen load, number of tumor-infiltrating lymphocytes and expression of PD-1/ PD-L1 in high grade serous ovarian cancer , 2016,7(12):13587-13598. [本文引用: 1]
LuHZ, WangDK, WangZ . Correlation analysis of the prognosis of HPV positive oropharyngeal cancer patients with T cell infiltration and neoantigen load Hereditas (Beijing), 2019,41(8):725-735. [本文引用: 1]
JurtzV, PaulS, AndreattaM, MarcatiliP, PetersB, NielsenM . NetMHCpan-4.0: Improved peptide-MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data , 2017,199(9):3360-3368. [本文引用: 3]
NielsenM, AndreattaM . NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets , 2016,8(1):33. [本文引用: 1]
NielsenM, LundegaardC, BlicherT, LamberthK, HarndahlM, JustesenS, R?derG, PetersB, SetteA, LundO, BuusS . NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence , 2007,2(8):e796. [本文引用: 1]
O'donnelLTJ, RubinsteynA, BonsackM, RiemerAB, LasersonU, HammerbacherJ . MHCflurry: open-source class I MHC binding affinity prediction , 2018,7(1):129-132 e4. [本文引用: 2]
GfellerD, GuillaumeP, MichauxJ, PakHS, DanielRT, RacleJ, Coukos G and Bassani-Sternberg M. The length distribution and multiple specificity of naturally presented HLA-I ligands , 2018,201(12):3705-3716. [本文引用: 2]
PearsonH, DaoudaT, GranadosDP, DuretteC, BonneilE, CourcellesM, RodenbrockA, LaverdureJP, CotéC, MaderS, LemieuxS, ThibaultP, PerreaultC . MHC class I-associated peptides derive from selective regions of the human genome , 2016,126(12):4690-4701. [本文引用: 6]
Bassani-SternbergM, Pletscher-FrankildS, JensenLJ, MannM . Mass spectrometry of human leukocyte antigen class I peptidomes reveals strong effects of protein abundance and turnover on antigen presentation , 2015,14(3):658-673. [本文引用: 1]
ShaoW, PedrioliPGA, WolskiW, ScurtescuC, SchmidE, VizcaínoJA, CourcellesM, SchusterH, KowalewskiD, MarinoF, ArlehamnCSL, VaughanK, PetersB, SetteA, OttenhoffTHM, MeijgaardenKE, NieuwenhuizenN, KaufmannSHE, SchlapbachR, CastleJC, Nesvizhskii AI, NielsenM, Deutsch EW, Campbell DS, Moritz RL, Zubarev RA, Ytterberg AJ, Purcell AW, MarcillaM, ParadelaA, WangQ, CostelloCE, Ternette N, van Veelen PA, van Els CACM, HeckAJR, de Souza GA, SollidLM, AdmonA, StevanovicS, RammenseeHG, ThibaultP, PerreaultC, Bassani-SternbergM, AebersoldR, CaronE . The SysteMHC atlas project , 2018,46(D1):D1237-D1247. [本文引用: 1]
AbelinJG, KeskinDB, SarkizovaS, HartiganCR, ZhangW, SidneyJ, StevensJ, LaneW, ZhangGL, EisenhaureTM, ClauserKR, HacohenN, RooneyMS, CarrSA, WuCJ . Mass spectrometry profiling of HLA-Associated peptidomes in Mono-allelic cells enables more accurate epitope prediction , 2017,46(2):315-326. [本文引用: 3]
RammenseeHG, FriedeT, StevanoviícS . MHC ligands and peptide motifs: first listing , 1995,41(4):178-228. [本文引用: 1]
HuntDF, HendersonRA, ShabanowitzJ, SakaguchiK, MichelH, SevilirN, CoxAL, AppellaE, EngelhardVH . Characterization of peptides bound to the class I MHC molecule HLA-A2.1 by mass spectrometry , 1992,255(5049):1261-1263. [本文引用: 1]
TrolleT, McmurtreyCP, SidneyJ, BardetW, OsbornSC, KaeverT, SetteA, HildebrandWH, NielsenM, PetersB . The length distribution of class I-restricted T cell epitopes is determined by both peptide supply and MHC allele- specific binding preference , 2016,196(4):1480-1487. [本文引用: 2]
Str?nenE, ToebesM, KeldermanS, van Buuren MM, YangW, van Rooij N, DoniaM, B?schenML, Lund-Johansen F, OlweusJ, SchumacherTN . Targeting of cancer neoantigens with donor-derived T cell receptor repertoires , 2016,352(6291):1337-1341. [本文引用: 2]
GrosA, ParkhurstMR, TranE, PasettoA, RobbinsPF, IlyasS, PrickettTD, GartnerJJ, CrystalJS, RobertsIM . Prospective identification of neoantigen-specific lymphocytes in the peripheral blood of melanoma patients , 2016,22(4):433-438. [本文引用: 2]
NielsenM, LundegaardC, LundO, KesmirC . The role of the proteasome in generating cytotoxic T-cell epitopes: insights obtained from improved predictions of proteasomal cleavage , 2005,57(1-2):33-41. [本文引用: 1]
MüllerM, GfellerD, CoukosG, Bassani-SternbergM . 'Hotspots' of antigen presentation revealed by human leukocyte antigen ligandomics for neoantigen prioritization , 2017,8:1367. [本文引用: 1]
CalisJJ, MaybenoM, GreenbaumJA, Weiskopf D, deSilva AD, SetteA, Ke?mirC, PetersB . Properties of MHC class I presented peptides that enhance immunogenicity , 2013,9(10):e1003266. [本文引用: 1]
AssarssonE, SidneyJ, OseroffC, PasquettoV, BuiHH, FrahmN, BranderC, PetersB, GreyH, SetteA . A quantitative analysis of the variables affecting the repertoire of T cell specificities recognized after vaccinia virus infection , 2007,178(12):7890-7901. [本文引用: 1]
BentzenAK, SuchL, JensenKK, MarquardAM, JessenLE, MillerNJ, ChurchCD, LyngaaR, KoelleDM, BeckerJC, LinnemannC, SchumacherTNM, MarcatiliP, NghiemP, NielsenM, HadrupSR . T cell receptor fingerprinting enables in-depth characterization of the interactions governing recognition of peptide-MHC complexes , 2018,36(12):1191-11996. [本文引用: 1]
BentzenAK, MarquardAM, LyngaaR, SainiSK, RamskovS, DoniaM, SuchL, FurnessAJ, McgranahanN, RosenthalR, StratenPT, SzallasiZ, SvaneIM, SwantonC, QuezadaSA, JakobsenSN, EklundAC, HadrupSR . Large-scale detection of antigen-specific T cells using peptide-MHC-I multimers labeled with DNA barcodes , 2016,34(10):1037-1045. [本文引用: 1]