删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

中国科学院数学与系统科学研究院导师教师师资介绍简介-付岩

本站小编 Free考研考试/2020-05-20

Our research area:bioinformatics, biostatistics, data mining and machine learning. We are currently focusing on utilizing and developing powerful computational and statistical algorithms and software tools for mass spectrometry based proteomics studies, e.g., protein identification, post-translational modification identification, discovery and localization, protein quantification, proteotypic peptide prediction, false discovery rate control, multiple hypothesis testing, etc.
Our research area is interdisciplinary research of statistics, computation and biology, with current focus on computational and statistical proteomics. Our representative research results are summarized below.

1. Algorithms and software for protein and post-translational modification identification and quantification

Searching mass spectrometry data against protein databases to identify protein sequences and post-translational modifications is central to proteomics research. In 2004, we proposed a new scoring function named "kernelized Spectral Vector Dot Product (KSDP)", and developed pFind 1.0, the first protein identification search engine in China (Bioinformatics, 2004,20:1948~1954). Since then, pFind has been developed continuously for years and evolved into the well-known pFind protein identification system and pFind research group (http://pfind.ict.ac.cn).

The huge number of unexpected post-translational modifications on proteins are considered to be the "dark matter" in proteomic data. We have developed a variety of modification discovery algorithms. We proposed the open mass library search algorithm pMatch to discover unexpected modifications by comparing the similarities between modified and unmodified spectra. The paper of pMatch was accepted and reported in ISMB (2010), one of the top conferences of bioinformatics, and meanwhile published in Bioinformatics (2010). At present, pMatch has become an algorithm frequently cited and referenced in the field of mass library search and modification discovery. Based on pMatch, we have recently developed a glycosylation modification identification algorithm pMatchGlyco (BioMed Research International, 2018).

We developed DeltAMT, an algorithm for mass spectra clustering using peptide mass and retention time information to discover high-abundance modification types (Molecular & Cellular Proteomics, 2011). In the core fucosylated glycoprotein identification research collaborated with the State Key Laboratory of Proteomics of China, DeltAMT as well as other data analysis methods were used to successfully identify the largest set of core ucosylated sites at that time (Molecular & Cellular Proteomics, 2010).

We developed PTMiner, a high-accuracy probabilistic algorithm for modification localization and quality control for open (mass tolerant) database search (Molecular & Cellular Proteomics,2019). The algorithm automatically learns the prior probability, the mass-matching error distribution and the matching-peak intensity distribution from the mass spectral data through an iterative process, and uses the continuously updated prior probability and the two types of distributions to more accurately estimate the posterior probability of the modified site. We used PTMiner to analyze the modifications present in the massive data of human proteome draft, and localized more than one million modifications at 1% FDR, systematically characterizing known and unknown modifications in the human proteome. The paper was once the second ‘most read’ paper when published online. Based on the PTMiner algorithm, We developed SAVControl, a quality control method for protein amino acid mutations (can be treated as a special type of modification), which was published in Journal of Proteomics (2018).

In protein quantification, mass spectrometry usually has large randomness such as: 1) some peptides can be detected while some cannot be, and 2) peptides of the same concentrations may have a large difference in mass spectrometry signal intensity. These randomness seriously reduce the accuracy of protein quantification. In order to solve the above problems, we proposed the concept of quantitative mass-spectrometry efficiency of peptides, and developed a new protein absolute quantification algorithm, named LFAQ, based on the predicted peptide quantitative efficiencies (Analytical Chemistry, 2019a). Then we proposed to incorporate the digestibility of peptides into peptide detectability prediction model and developed AP3, a peptide detectability prediction algorithm based on the random-forest machine learning method (Analytical Chemistry, 2019b).

2. Proteomics data FDR control methods and applications

While big data are giving us big opportunities to discover new knowledge, there are also many big risks and pitfalls of false discoveries. False discovery rate (FDR) analysis in high-dimensional statistical inference is considered as one of the most important progress of statistics. In multiple hypothesis testing, the FDR is defined as the expectation of the proportion of falsely rejected hypotheses among all rejected hypotheses. The initial paper (Benjamini and Hochberg, J. R. Stat Society B, 1995) proposing the FDR has been cited more than 57,000 times, showing its importance and influence. The main researchers of FDR include famous statisticians Bradley Efron, John Storey and Emmanuel Candes.

Specially, how to accurately estimate the FDR of subgroups of hypothesis tests is a difficult problem, which was proposed initially by Bradley Efron (Ann. Appl. Stat. 2:197-223, 2008). This problem is practically important in proteomics. For the first time, we have mathematically studied the problem of FDR estimation for subgroups of peptide identifications (such as modified peptides) in proteomic data analysis. Via Bayesian analysis we theoretically proved that the subgroup FDR and the combined FDR are not equal to each other under the same scoring threshold, and thus proposed the principle of separate subgroup filtering and FDR estimation and derived a series of insightful theoretical results (Statistics and Its Interface, 2012).

Based on the above theoretical analysis, we proposed a simpler but more intuitive relationship between the subgroup FDR and combined FDR, and further developed Transfer FDR, an accurate FDR estimation method for small subgroups of peptide identifications (Molecular & Cellular Proteomics, 2014). The rational of Transfer FDR is as follows. When the abundance of the modification to be identified is low, the direct FDR estimation would be severely inaccurate due to insufficient data sample size. Based on the observation and analysis of real data, we invented a estimation method for the conditional probability of an erroneously identified peptide being a modified peptide. Based on this estimation, a quantitative relationship between the subgroup FDR of modified peptides and the combined FDR of all peptides is obtained. Through this relationship, the subgroup FDR can be indirectly predicted from the combined FDR, which can usually be accurately estimated. This overcomes the difficulty of small subgroup FDR estimation due to the lack of sample size.

We applied the above subgroup FDR analysis and Transferred FDR methods to a number of special identification problems. For example, in the study of FDR estimation of novel genes identified by six-frame translation in proteogenomics, it was found that if the combined FDR were used, the gene annotation ratio is the dominant factor affecting the real FDR of new genes (new peptides) (Bioinformatics, 2015). Also, the Transfer FDR method was successfully applied to the quality control of open modification search (Molecular & Cellular Proteomics,2019) and amino acid mutation identification (Journal of Proteomics,2018). In addition, the Transfer FDR method was successfully used in a collaborative study of primate-specific gene identification (Genome Research, 2019).

3. Statistical inference and data mining

In the process of analyzing biological data, we developed several general statistical inference and data mining methods, going one step forward from applied research to methodological and theoretical research.

The target-decoy competition (TDC) strategy is the gold standard method for FDR control of proteomic data. This method has been used for many years, but it is still an empirical method and lacks theoretical foundation. In this method, the ratio of the numbers of decoy and target results is usually used as an estimate of FDR, but whether this can control FDR (that is, to make the real FDR less than a specified threshold) is still unknown. We found that a +1 correction to the above estimate (decoy number plus 1) can strictly control FDR, and gave theoretical proof for this conclusion (arXiv, 2015).

Further and more important, we extended the above corrected TDC method to the general multiple hypothesis testing problem (arXiv, 2018). The previous FDR control methods in multiple hypothesis testing were usually based on a null distribution of the test statistic. However, all types of null distributions, including theoretical, permutation-based and empirical ones, have some inherent drawbacks. For example, the theoretical null distribution will fail if the assumptions on the sample distribution are wrong. In addition, many FDR control methods require the estimation of the proportion of true null hypotheses, which is difficult and has not been very well resolved. We proposed a general TDC-based FDR control method using random permutations. Our method does not need to estimate the null distribution of the statistic or the proportion of true null hypotheses, but is only based on the rank of the tests by some statistic/score. It constructs competitive decoy hypotheses from random sample permutations. We proved that this method can rigorously control FDR. Simulation experiments show that our method can control FDR more effectively than the Bayes and Empirical Bayes methods, and has greater statistical power.

Prof. Emmanuel Candes, a famous statistician from Stanford University, developed, in collaboration with Rina Foygel Barber, the knockoff filter method (Annals of Statistics, 43:2055, 2015), which is quite similar to our general TDC-FDR method. However, our "+1" correction and FDR control theorem was given earlier, though in the context of mass spectrometry (Kun He, master-degree thesis, 2013). As recognized by Prof. William Noble from the University of Washington and Prof. Uri Keich from the University of Sydney in their recent papers (Journal of Proteome Research, 18:585-593, 2019; arXiv: 1907.01458, 2019), our and Candes's results are independent researches:

“The +1 correction was proved by Barber and Candès (The Annals of Statistics, 43:2055, 2015) in the context of linear regression (see their “knockoff+” procedure) and by He et al. (arXiv, 2015) in the context of mass spectrometry (see their equation 25). ”
—— Cited from (Journal of Proteome Research, 18:585-593, 2019)

“The TDC approach has been theoretically established (subject to a small finite-sample correction) by He et al.(arXiv, 2015) and independently, and in a much wider context, by Barber and Candès (The Annals of Statistics, 43:2055, 2015).”
—— Cited from (arXiv: 1907.01458, 2019)

In addition, in solving the problem of protein homology prediction, We proposed several learning-to-rank algorithms based on kernel machines (e.g. SVM). With the local data normalization and the support-vector down sampling methods, we achieved the Champion Award (Tied for 1st Place Overall, Honorable Mentions for Squared Error and Average Precision in protein homology prediction task) in the ACM KDDCUP-2004 data mining competition. This was the first time that Chinese researchers have won the championship in KDDCUP, the most influential data mining competition worldwide. A query-adaptive ensemble learning algorithm was proposed later and had a better performance (ISBRA, 2011).

Selected recent publications

    Xinpei Yi, Fuzhou Gong*, Yan Fu*. Transfer posterior error probability estimation for peptide identification. BMC Bioinformatics, 21:173, 2020.
    Qingbo Shu#, Mengjie Li#, Lian Shu#, Zhiwu An, Jifeng Wang, Hao Lv, Ming Yang, Tanxi Cai, Tony Hu, Yan Fu* and Fuquan Yang*. Large-scale Identification of N-linked Glycopeptides in Human Serum using HILIC Enrichment and Spectral Library Search. Molecular & Cellular Proteomics, 19:672–689, 2020.
    Zhiqiang Gao#, Cheng Chang#, Jinghan Yang, Yunping Zhu*, Yan Fu*. AP3: An Advanced Proteotypic Peptide Predictor for Targeted Proteomics by Incorporating Peptide Digestibility. Analytical Chemistry, 91, 8705−8711, 2019.
    Zhiwu An#, Linhui Zhai#, Wantao Ying, Xiaohong Qian, Fuzhou Gong*, Minjia Tan* and Yan Fu*. PTMiner: Localization and Quality Control of Protein Modifications Detected in an Open Search and Its Application to Comprehensive Post-translational Modification Characterization in Human Proteome. Molecular & Cellular Proteomics, 18(2) 391-405, 2019.
    Cheng Chang#, Zhiqiang Gao#, Wantao Ying#, Yan Fu*, Yan Zhao, Songfeng Wu, Mengjie Li, Guibin Wang, Xiaohong Qian*, Yunping Zhu*, Fuchu He*. LFAQ: towards unbiased label-free absolute protein quantification by predicting peptide quantitative factors. Analytical Chemistry, 91, 1335−1343, 2019.
    Yi Shao, Chunyan, Chen Hao, Shen, Bin Z He, Daqi Yu, Shuai Jiang, Shilei Zhao, Zhiqiang Gao, Zhenglin Zhu, Xi Chen, Yan Fu, Hua Chen, Ge Gao, Manyuan Long, Yong E Zhang. GenTree, an integrated resource for analyzing the evolution and function of primate-specific coding genes. Genome Research, 29(4):682-696, 2019.
    Xinpei Yi#, Bo Wang#, Zhiwu An, Fuzhou Gong*, Jing Li*, Yan Fu*, Quality control of single amino acid variations detected by tandem mass spectrometry, Journal of Proteomics, 187:144–151, 2018.
    Zhiwu An#, Qingbo Shu#, Hao Lv, Lian Shu, Jifeng Wang, Fuquan Yang*, Yan Fu*, N-Linked Glycopeptide Identification Based on Open Mass Spectral Library Search, BioMed Research International, doi.org/10.1155/2018/1564136, 2018.
    Yan Fu, Data Analysis Strategies for Protein Modification Identification, In Klaus Jung (Ed.): Statistical Analysis in Proteomics, Humana Press, New York, NY,pp1362:265-75, 2016.
    Kun Zhang#, Yan Fu*, Wen-Feng Zeng, Kun He, Hao Chi, Chao Liu, Yan-Chang Li, Yuan Gao, Ping Xu*, Si-Min He*, A note on the false discovery rate of novel peptides in proteogenomic. Bioinformatics, 31(20):3249-3253, 2015.
    Shan Lu, Sheng-Bo Fan, Bing Yang, Yu-Xin Li, Jia-Ming Meng, Long Wu, Pin Li, Kun Zhang, Mei-Jun Zhang, Yan Fu, Jin-Cai Luo, Rui-Xiang Sun, Si-Min He, Meng-Qiu Dong, Mapping native disulfide bonds at a proteome scale. Nature Methods, 12:329-331, 2015.
    Yan Fu*, Xiaohong Qian, Transferred subgroup false discovery rate for rare post-translational modifications detected by mass spectrometry, Molecular & Cellular Proteomics, 13(5):1359-1368, 2014.
    Yan Fu, Kernel Methods and Applications in Bioinformatics. In Kasabov, Nikola K. (Ed.): Handbook of Bio-/Neuro-Informatics, Springer-Verlag Berlin and Heidelberg GmbH & Co. K, pp275-285, 2013.
    Yan Fu, Bayesian false discovery rates for post-translational modification proteomics, Statistics and Its Interface, 59(1):47-59, 2012.
    Yan Fu*, Li-Yun Xiu, Wei Jia, Ding Ye, Rui-Xiang Sun, Xiao-Hong Qian, Si-Min He. DeltAMT: A Statistical Algorithm for Fast Detection of Protein Modifications From LC-MS/MS Data, Molecular & Cellular Proteomics, 10(5):1-15, 2011.
    Yan Fu*, Rong Pan, Qiang Yang, Wen Gao. Query-Adaptive Ranking with Support Vector Machines for Protein Homology Prediction. In Proceedings of the 7th International Symposium on Bioinformatics Research and Applications (ISBRA2011). Lecture Notes in Bioinformatics, 6674:320–331, 2011
    Ding Ye#,Yan Fu*,Rui-Xiang Sun*,Hai-Peng Wang,Zuo-Fei Yuan,Hao Chi, Si-Min He,Open MS/MS spectral library search to identify unanticipated post-translational modifications and increase spectral identification rate. In Proceedings of the 18th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2010). Bioinformatics, 26(12):i399-i406, 2010
    Wei Jia#, Zhuang Lu#, Yan Fu#, Hai-Peng Wang, Le-Heng Wang, Hao Chi, Zuo-Fei Yuan, Zhao-Bin Zheng, Li-Na Song, Huan-Huan Han, Yi-Min Liang, Jing-Lan Wang, Yun Cai, Yu-Kui Zhang, Yu-Lin Deng, Wan-Tao Ying*, Si-Min He*, Xiao-Hong Qian*, A Strategy for Precise and Large Scale Identification of Core Fucosylated Glycoproteins, Molecular & Cellular Proteomics, 8(5):913-923, 2009.
    Yan Fu#*, Wei Jia, Zhuang Lu, Haipeng Wang, Zuofei Yuan, Zuofei Yuan, Hao Chi, You Li, Liyun Xiu, Wenping Wang, Chao Liu, Leheng Wang, Ruixiang Sun, Wen Gao, Xiaohong Qian, Si-Min He. Efficient discovery of abundant post-translational modifications and spectral pairs using peptide mass and retention time differences. BMC Bioinformatics (APBC2009), 10:S50-S50, 2009.
    安志武#, 付岩*, 基于质谱的蛋白质修饰定位算法. 生命的化学, 37(1):104-112, 2017.

Manuscripts in submission & preprints

    Jinghan Yang#, Zhiqiang Gao#, Xiuhan Ren, Jie Sheng, Ping Xu, Cheng Chang*, Yan Fu*. DeepDigest: prediction of protein proteolytic digestion with deep learning. bioRxiv 2020.03.13.990200; doi: https://doi.org/10.1101/2020.03.13.990200
    Feng Xu#*, Li Yu#, Xuehui Peng#, Junling Zhang, Suzhen Li, Shu Liu, Yanan Yi, Zhiwu An, Fuqiang Wang, Yan Fu*, Ping Xu*. The use of LysargiNase complementary to trypsin in large scale phosphoproteome study for unambiguous phosphosite localization. 2019
    Kun He#, Mengjie Li, Yan Fu*, Fuzhou Gong, Xiaoming Sun. Null-free False Discovery Rate Control Using Decoy Permutations for Multiple Testing. arXiv:1804.08222. 2018.
    Kun He#, Yan Fu*, Wen-Feng Zeng, Lan Luo, Hao Chi, Chao Liu, Lai-Yun Qing, Rui-Xiang Sun, and Si-Min He. A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics. arXiv:1501.00537. 2015.

[Full publicaitons]

Software

PTMiner

A software tool for localization and quality control of protein modifications detected by both open and close search

SAVControl

A software tool for quality control of single amino acid variations detected by tandem mass spectrometry

pMatchGlyco

A software tool for N-Linked glycopeptide identification based on open mass spectral library search

LFAQ

A software tool for unbiased label-free absolute protein quantification by predicting peptide quantitative factors

AP3

A software tool for prediction of proteotypic peptides in proteomics using random forest algorithm

Yan Fu
PhD, Associate Professor, Doctoral Supervisor

Academy of Mathematics and Systems Science, Chinese Academy of Sciences

Address: No.55 Zhongguancun East Road, Haidian District, Beijing, 100190, China

E-mail: yfu(at)amss(dot)ac(dot)cn

Website: http://fugroup.amss.ac.cn/

[Chinese version]

Research Interests

Data mining and Bioinformatics. Currently focus on computational proteomics and mass spectrometry: algorithms and software tools for protein identification from LC-MS/MS data, post-translational modification discovery, search results reranking, multiple hypothesis testing, etc.
Positions and Education

Associate Professor (2011 - ), Academy of Mathematics and Systems Science, Chinese Academy of Sciences

Associate Professor (2009 - 2011) and Assistant Professor (2007 - 2009), Institute of Computing Technology, Chinese Academy of Sciences

Ph.D. (2000 - 2007), Institute of Computing Technology, Chinese Academy of Sciences

Selected publications

Xinpei Yi, Fuzhou Gong*, Yan Fu*. Transfer posterior error probability estimation for peptide identification. BMC Bioinformatics, 21:173, 2020.

Qingbo Shu#, Mengjie Li#, Lian Shu#, Zhiwu An, Jifeng Wang, Hao Lv, Ming Yang, Tanxi Cai, Tony Hu, Yan Fu* and Fuquan Yang*. Large-scale Identification of N-linked Glycopeptides in Human Serum using HILIC Enrichment and Spectral Library Search. Molecular & Cellular Proteomics, 19:672–689, 2020.

Zhiqiang Gao#, Cheng Chang#, Jinghan Yang, Yunping Zhu*, Yan Fu*. AP3: An Advanced Proteotypic Peptide Predictor for Targeted Proteomics by Incorporating Peptide Digestibility. Analytical Chemistry, 2019, 91, 8705−8711.

Zhiwu An#, Linhui Zhai#, Wantao Ying, Xiaohong Qian, Fuzhou Gong*, Minjia Tan* and Yan Fu*. PTMiner: Localization and Quality Control of Protein Modifications Detected in an Open Search and Its Application to Comprehensive Post-translational Modification Characterization in Human Proteome. Molecular & Cellular Proteomics, 2019, 18 (2) 391-405.

Cheng Chang#, Zhiqiang Gao#, Wantao Ying#, Yan Fu*, Yan Zhao, Songfeng Wu, Mengjie Li, Guibin Wang, Xiaohong Qian*, Yunping Zhu*, Fuchu He*. LFAQ: towards unbiased label-free absolute protein quantification by predicting peptide quantitative factors. Analytical Chemistry, 2019, 91, 1335−1343.

Yi Shao, Chunyan, Chen Hao, Shen, Bin Z He, Daqi Yu, Shuai Jiang, Shilei Zhao, Zhiqiang Gao, Zhenglin Zhu, Xi Chen, Yan Fu, Hua Chen, Ge Gao, Manyuan Long, Yong E Zhang. GenTree, an integrated resource for analyzing the evolution and function of primate-specific coding genes. Genome Research, 2019 04 12;29(4):682-696.

Xinpei Yi#, Bo Wang#, Zhiwu An, Fuzhou Gong*, Jing Li*, Yan Fu*, Quality control of single amino acid variations detected by tandem mass spectrometry, Journal of Proteomics, 187:144–151, 2018.

Zhiwu An#, Qingbo Shu#, Hao Lv, Lian Shu, Jifeng Wang, Fuquan Yang*, Yan Fu*, N-Linked Glycopeptide Identification Based on Open Mass Spectral Library Search, BioMed Research International, doi.org/10.1155/2018/1564136, 2018.

Yan Fu, Data Analysis Strategies for Protein Modification Identification, In Klaus Jung (Ed.): Statistical Analysis in Proteomics, Humana Press, New York, NY,pp1362:265-75, 2016.

Kun Zhang#,Yan Fu*,Wen-Feng Zeng,Kun He,Hao Chi,Chao Liu,Yan-Chang Li,Yuan Gao,Ping Xu*,Si-Min He*,A note on the false discovery rate of novel peptides in proteogenomic,Bioinformatics,2015.06.14,3249~3253

Shan Lu,Sheng-Bo Fan,Bing Yang,Yu-Xin Li,Jia-Ming Meng,Long Wu,Pin Li,Kun Zhang,Mei-Jun Zhang,Yan Fu,Jin-Cai Luo,Rui-Xiang Sun,Si-Min He,Meng-Qiu Dong,Mapping native disulfide bonds at a proteome scale,Nature Methods,2015.01.01,12:329~331

Yan Fu* and Xiaohong Qian. Transferred Subgroup False Discovery Rate for Rare Post-translational Modifications Detected by Mass Spectrometry. Molecular & Cellular Proteomics, 13(5):1359-1368, 2014.(pdf)

Yan Fu. Kernel Methods and Applications in Bioinformatics. In Kasabov, Nikola K. (Ed.): Handbook of Bio-/Neuro-Informatics, Springer-Verlag Berlin and Heidelberg GmbH & Co. K, pp275-285, 2013.(pdf)

Yan Fu. Bayesian false discovery rates for post-translational modification proteomics. Statistics and Its Interface, 5:47–59, 2012.(pdf)

Zuo-Fei Yuan, Chao Liu, Hai-Peng Wang, Rui-Xiang Sun, Yan Fu, Jing-Fen Zhang, Le-Heng Wang, Hao Chi, You Li, Li-Yun Xiu, Wen-Ping Wang, Si-Min He. pParse: a method for accurate determination of monoisotopic peaks in high-resolution mass spectra. Proteomics, 12(2): 226–235, 2012. (pdf)

Yan Fu, Liyun Xiu, Wei Jia, Ding Ye, Ruixiang Sun, Xiaohong Qian, Si-min He. DeltAMT: a statistical algorithm for fast detection of protein modifications from LC-MS/MS data. Molecular & Cellular Proteomics, 10(5):M110.000455, 2011. (pdf)

Yan Fu, Rong Pan, Qiang Yang, Wen Gao. Query-Adaptive Ranking with Support Vector Machines for Protein Homology Prediction. In Proceedings of the 7th International Symposium on Bioinformatics Research and Applications (ISBRA2011). Lecture Notes in Bioinformatics, 6674:320–331, 2011. (pdf)

Ding Ye, Yan Fu*, Ruixiang Sun*, Haipeng Wang, Zuofei Yuan, Hao Chi and Simin He*. Open MS/MS Spectral Library Search to Identify Unanticipated Post-Translational Modifications and Increase Spectral Identification Rate. In Proceedings of the 18th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2010). Bioinformatics, 26(12):i399-i406, 2010. (pdf)

Yan Fu*, Wei Jia, Zhuang Lu, Haipeng Wang, Zuofei Yuan, Hao Chi, You Li, Liyun Xiu, Wenping Wang, Chao Liu, Leheng Wang, Ruixiang Sun, Wen Gao, Xiaohong Qian, Si-Min He. Efficient discovery of abundant post-translational modifications and spectral pairs using peptide mass and retention time differences. The Seventh Asia-Pacific Bioinformatics Conference (APBC 2009). BMC Bioinformatics. 10(Suppl 1):S50, 2009. (pdf)

Wei Jia#, Zhuang Lu#, Yan Fu#, Hai-Peng Wang, Le-Heng Wang, Hao Chi, Zuo-Fei Yuan, Zhao-Bin Zheng, Li-Na Song, Huan-Huan Han, Yi-Min Liang, Jing-Lan Wang, Yun Cai, Yu-Kui Zhang, Yu-Lin Deng, Wan-Tao Ying, Si-Min He, and Xiao-Hong Qian. A strategy for precise and large-scale identification of core fucosylated glycoproteins. Molecular & Cellular Proteomics. 8:913-923, 2009. (pdf)

Yan Fu, Wen Gao, Simin He, Ruixiang Sun, Hu Zhou, Rong Zeng. Mining Tandem Mass Spectral Data to Develop a More Accurate Mass Error Model for Peptide Identification. Pacific Symposium on Biocomputing (PSB) 12:421-432, 2007. (pdf/Supplementary information)

Le-Heng Wang, De-Quan Li, Yan Fu, Hai-Peng Wang, Jing-Fen Zhang, Zuo-Fei Yuan,Rui-Xiang Sun, Rong Zeng, Si-Min He, Wen Gao, pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry. Rapid Communications in Mass Spectrometry, 21,2985-2991,2007. (pdf)

Haipeng Wang, Yan Fu, Ruixiang Sun, Simin He, Rong Zeng, and Wen Gao. An SVM Scorer for More Sensitive and Reliable Peptide Identification via Tandem Mass Spectrometry. Pacific Symposium on Biocomputing (PSB) 11:303-314, 2006. (pdf)

Dequan Li, Yan Fu, Ruixiang Sun, Charles X. Ling, Yonggang Wei, Hu Zhou, Rong Zeng, Qiang Yang, Simin He and Wen Gao. pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry. Bioinformatics, 21(13), pp3049-3050, 2005. (pdf)

Yan Fu, Ruixiang Sun, Qiang Yang, Simin He, Chunli Wang, Haipeng Wang, Shiguang Shan, Junfa Liu, Wen Gao. A Block-Based Support Vector Machine Approach to the Protein Homology Prediction Task in KDD Cup 2004. ACM SIGKDD Explorations. Vol.6, No.2, pp120-124, 2004.(pdf)

Yan Fu, Qiang Yang, Ruixiang Sun, Dequan Li, Rong Zeng, Charles X. Ling, Wen Gao. Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry. Bioinformatics. Vol.20, pp1948-1954, 2004. (pdf/Supplementary information)

Yan Fu, Qiang Yang, Charles X. Ling, Haipeng Wang, Dequan Li, Ruixiang Sun, Hu Zhou, Rong Zeng, Yiqiang Chen, Simin He, Wen Gao. A Kernel-based Case Retrieval Algorithm with Application to Bioinformatics. In Proceedings of the 8th Pacific Rim International Conference on Artificial Intelligence (PRICAI 2004), Auckland, New Zealand, August 9-13, 2004, LNAI 3157, pp. 544–553. (pdf)

Yan Fu, Simin He, Ruixiang Sun, Leheng Wang. A review of Key computational problems in tandem mass spectrometry-based protein identification. Information Technology Letter, 8(1):16-32, 2010. (in Chinese) (pdf)

Yan Fu. Machine Learning Based Bioinformation Retrieval. Doctoral dissertation, Chinese Academy of Sciences, 2007. (in Chinese) (abstract/pdf)

Ruixiang Sun, Yan Fu, Dequan Li, Jingfen Zhang, Xiaobiao Wang, Quanhu Sheng, Rong Zeng, Yiqiang Chen, Simin He, Wen Gao. Mass Spectrometry-Based Computational Proteomics Research. SCIENCE IN CHINA Ser. E Information Sciences. 36(2), 222-234, 2006. (in Chinese) (pdf)

Yiqiang Chen, Wen Gao, Yan Fu, Dequan Li, Xiang Chen. Research on Protein Recognition base on Information Technology. Chinese Bulletin of Life Sciences, Vol.15, No.2, pp70-78, 2003. (in Chinese) (pdf)

Yan Fu, Yaowei Wang, Weiqiang Wang, Wen Gao. Content-Based Natural Image Classification and Retrieval Using SVM. Chinese Journal of Computers, Vol.26, No.10, pp.1261-1265, 2003. (in Chinese) (pdf)

Yan Fu, Tiejun Huang, Ke Yu, Tao Li, Hao Zhang. Overview of Interactive Model of Computing. Chinese Journal of Computer Research and Development, vol.39, no.6, pp701-706, 2002. (in Chinese) (pdf)

Software tools

PTMiner: Localization and Quality Control of Protein Modifications Detected by Open Search

SAVControl: Quality control of single amino acid variations detected by tandem mass spectrometry

pMatchGlyco: N-Linked Glycopeptide Identification Based on Open Mass Spectral Library Search

LFAQ: Unbiased label-free absolute protein quantification by predicting peptide quantitative factors

AP3: Prediction of proteotypic peptides in proteomics using random forest algorithm

TransferPEP: Transfer posterior error probability (local FDR) estimation for peptide identification

pFind: a database-searching engine for peptide & protein identification via tandem mass spectrometry

pMatch: an open MS/MS library search tool for identification of peptides and their modifications

pCluster: a clustering tool for modification detection using LC, MS or MS/MS information

Honors

Microsoft Fellowship 2004, Microsoft Research Asia

Champion of ACM KDD Cup 2004 data mining competition

相关话题/数学