Publication in refereed journal
香港中文大学研究人员 ( 现职)
沈祖尧教授 (内科及药物治疗学系) |
谢志恆先生 (内科及药物治疗学系) |
莫树锦教授 (肿瘤学系) |
徐国荣教授 (生物医学学院) |
陈力元教授 (内科及药物治疗学系) |
李健康先生 (计算机科学与工程学系) |
梁广锡教授 (计算机科学与工程学系) |
全文
数位物件识别号 (DOI) http://dx.doi.org/10.1109/TCBB.2009.6 |
引用次数
Web of Sciencehttp://aims.cuhk.edu.hk/converis/portal/Publication/14WOS source URL
其它资讯
摘要Extraction of meaningful information from large experimental data sets is a key element in bioinformatics research. One of the challenges is to identify genomic markers in Hepatitis B Virus (HBV) that are associated with HCC (liver cancer) development by comparing the complete genomic sequences of HBV among patients with HCC and those without HCC. In this study, a data mining framework, which includes molecular evolution analysis, clustering, feature selection, classifier learning, and classification, is introduced. Our research group has collected HBV DNA sequences, either genotype B or C, from over 200 patients specifically for this project. In the molecular evolution analysis and clustering, three subgroups have been identified in genotype C and a clustering method has been developed to separate the subgroups. In the feature selection process, potential markers are selected based on Information Gain for further classifier learning. Then, meaningful rules are learned by our algorithm called the Rule Learning, which is based on Evolutionary Algorithm. Also, a new classification method by Nonlinear Integral has been developed. Good performance of this method comes from the use of the fuzzy measure and the relevant nonlinear integral. The nonadditivity of the fuzzy measure reflects the importance of the feature attributes as well as their interactions. These two classifiers give explicit information on the importance of the individual mutated sites and their interactions toward the classification (potential causes of liver cancer in our case). A thorough comparison study of these two methods with existing methods is detailed. For genotype B, genotype C subgroups C1, C2, and C3, important mutation markers (sites) have been found, respectively. These two classification methods have been applied to classify never-seen-before examples for validation. The results show that the classification methods have more than 70 percent accuracy and 80 percent sensitivity for most data sets, which are considered high as an initial scanning method for liver cancer diagnosis.
着者Leung KS, Lee KH, Wang JF, Ng EYT, Chan HLY, Tsui SKW, Mok TSK, Tse PCH, Sung JJY
期刊名称IEEE/ACM Transactions on Computational Biology and Bioinformatics
出版年份2011
月份3
日期1
卷号8
期次2
出版社Institute of Electrical and Electronics Engineers (IEEE)
页次428 - 440
国际标準期刊号1545-5963
电子国际标準期刊号1557-9964
语言英式英语
关键词Data mining; DNA sequences of HBV; mutation sites; nonlinear integrals; rule learning; the signed fuzzy measures
Web of Science 学科类别Biochemical Research Methods; BIOCHEMICAL RESEARCH METHODS; Biochemistry & Molecular Biology; Computer Science; Computer Science, Interdisciplinary Applications; COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS; Mathematics; Mathematics, Interdisciplinary Applications; MATHEMATICS, INTERDISCIPLINARY APPLICATIONS; Statistics & Probability; STATISTICS & PROBABILITY