Publication in refereed journal
香港中文大学研究人员 ( 现职)
张元亭教授 (电子工程学系) |
甄秉言教授 (内科及药物治疗学系) |
全文
数位物件识别号 (DOI) http://dx.doi.org/10.1016/j.datak.2008.08.008 |
引用次数
Web of Sciencehttp://aims.cuhk.edu.hk/converis/portal/Publication/26WOS source URL
Scopushttp://aims.cuhk.edu.hk/converis/portal/Publication/34Scopus source URL
其它资讯
摘要This paper investigates a framework that actively selects informative document pairs for obtaining user feedback for semi-supervised document clustering. A gain-directed document pair selection method that measures how much we can learn by revealing judgments of selected document pairs is designed. We use the estimation of term co-occurrence probabilities as a clue for finding informative document pairs. Term co-occurrence probabilities are considered in the semi-supervised document clustering process to capture term-to-term dependence relationships. In the semi-supervised document clustering, each cluster is represented by a language model. We have conducted extensive experiments on several real-world corpora. The results demonstrate that our proposed framework is effective. ? 2008 Elsevier B.V. All rights reserved.
着者Huang R., Lam W.
期刊名称DATA & KNOWLEDGE ENGINEERING
出版年份2009
月份1
日期1
卷号68
期次1
出版社Elsevier BV
出版地Netherlands
页次49 - 67
国际标準期刊号0169-023X
语言英式英语
关键词Active learning, Document clustering, Language modeling, Semi-supervised