|
摘要随着在线用户生成内容的激增, 无监督情感分类方法有着广泛应用前景。现有基于情感词的无监督情感分类方法没有考虑句子类型和句间关系对情感分类的影响,分类效果较差; 基于自学习的无监督情感分类方法在生成伪标注数据集时, 又会引入较多错误。针对上述问题, 该文提出了一种基于多粒度计算和多准则融合的无监督情感分类方法。该方法通过多粒度计算, 提高现有基于情感词的无监督情感分类精度; 同时通过多准则融合来减少伪标注数据错误率。在3个真实中文数据集上的实验结果表明: 与现有无监督情感分类方法相比, 该方法平均提高了6.5%的分类精度。 |
关键词 :情感分类,无监督方法,多粒度计算,多准则融合 |
Abstract:The large amount of online user-generated content on the Web has created a need for unsupervised sentiment classification methods. Unsupervised sentiment classification methods based on sentiment words do not work well because the complex sentence structures and sentence types are seldom taken into account. Unsupervised sentiment classification methods based on self-learning have many errors when generating pseudo-labelled datasets. These limitations are reduced by the current method based on multi-granularity computing and multi-criteria fusion. The multi-granularity computing improves the accuracy of unsupervised sentiment classification methods based on sentiment words. The multi-criteria fusion reduces the number of errors in the pseudo-labelled data from the self-learning. Tests using three real Chinese review datasets show that the classification accuracy is 6.5% more accurate on average than with existing unsupervised sentiment classification methods. |
Key words:sentiment classificationunsupervised methodsmulti-granularity computingmulti-criteria fusion |
收稿日期: 2014-12-25 出版日期: 2015-08-04 |
|
通讯作者:黄永峰,教授,E-mail:yfhuang@tsinghua.edu.cnE-mail: yfhuang@tsinghua.edu.cn |
[1] | Pang B, Lee L L. Opinion mining and sentiment analysis [J]. Foundations and Trends in Information Retrieval, 2008, 2(1-2): 1-135. |
[2] | LIU Bing. Sentiment analysis and opinion mining [J]. Synthesis Lectures on Human Language Technologies, 2012, 5(1): 1-167. |
[3] | ZHANG Pu, HE Zhongshi. A weakly supervised approach to Chinese sentiment classification using partitioned self-training [J]. Journal of Information Science, 2013, 39(6): 815-831. |
[4] | Pang B, Lee L L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques [C]//Proceedings of Conference on Empirical Methods in Natural Language Processing. Philadelphia, USA: ACL, 2002: 79-86. |
[5] | XIAO Min, GUO Yuhong. Feature space independent semi-supervised domain adaptation via kernel matching [J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2015, 37(1): 54-66. |
[6] | Pan S J, Ni X C, Sun J T, et al. Cross-domain sentiment classification via spectral feature alignment [C]//Proceedings of the 19th International Conference on World Wide Web. New York, NY, USA: ACM, 2010: 751-760. |
[7] | LI Shoushan, WANG Zhongqing, ZHOU Guodong. Semi-supervised learning for imbalanced sentiment classification [C]//Proceedings of the Twenty-Second international joint conference on Artificial Intelligence. Barcelona, Spanish: AAAI, 2011: 1826-1831. |
[8] | WAN Xiaojun. Bilingual co-training for sentiment classification of Chinese product reviews [J]. Computational Linguistics, 2011, 37(3): 587-616. |
[9] | Turney P D. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews [C]//The 40th annual meeting of the Association for Computational Linguistics. Philadelphia, USA: ACL, 2002: 417-424. |
[10] | Ku L W, Lee L Y, Chen H H. Opinion extraction, summarization and tracking in news and blog corpora [C]//Proceedings ofAAAI-CAAW-06, the Spring Symposia on Computational Approaches to Analyzing Weblogs. Stanford, USA: AAAI, 2006. |
[11] | Taboada M, Brooke J, Tofiloski M, et al. Lexicon-based methods for sentiment analysis [J]. Computational Linguistics, 2011, 37(2): 267-307. |
[12] | TAN Songbo, WANG Yuefen, CHENG Xueqi. Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples [C]//Proceedings of the SIGIR. New York, NY, USA: ACM, 2008: 743-744. |
[13] | WANG Bingkun, MIN Yulin, HUANG Yongfeng, et al. Chinese reviews sentiment classification based on quantified sentiment lexicon and fuzzy set [C]//2013 International Conference on Information Science and Technology. YangZhou, China: IEEE, 2013: 677-680.null |