删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于公共特征空间的自适应情感分类\r\n\t\t

本站小编 Free考研考试/2022-01-16

\r洪文兴1,杞坚玮1,王玮玮1,郑晓晴1,翁 洋\r2\r
\r
AuthorsHTML:\r洪文兴1,杞坚玮1,王玮玮1,郑晓晴1,翁 洋\r2\r
\r
AuthorsListE:\rHong Wenxing1,Qi Jianwei1,Wang Weiwei1,Zheng Xiaoqing1,Weng Yang \r2\r
\r
AuthorsHTMLE:\rHong Wenxing1,Qi Jianwei1,Wang Weiwei1,Zheng Xiaoqing1,Weng Yang \r2\r
\r
Unit:\r\r1. 厦门大学航空航天学院,厦门 361005;\r
\r\r2. 四川大学数学学院,成都 610064\r
\r
\r
Unit_EngLish:\r1. School of Aerospace Engineering,Xiamen University,Xiamen 361005,China;
2. School of Mathematics,Sichuan University,Chengdu 610064,China\r
\r
Abstract_Chinese:\r针对情感分类这一项从文章或句子中得到观点态度的任务,常规情感分类模型大多需要耗费大量人力获取标注数据.为解决某些领域缺乏标注数据,且其他领域分类器无法在目标领域直接使用的现状,设计了一种新颖的基于构建公共特征空间方法,使分类模型可从有标注领域向无标注领域进行迁移适应,减少人工标注的成本开销,实现情感分类的领域自适应.该方法以大规模语料下预训练的词向量信息作为以词为元素的特征,在同种语言中表达情感所采用的句法结构相似这一假设前提下,通过对领域内特有的领域特征词进行替换的方式构建有标注数据集与无标注数据集基本共有的公共特征空间,使有标注数据集与无标注数据集实现信息共享.以此为基础借助深度学习中卷积神经网络采用不同尺寸卷积核对词语不同范围的上下文特征进行抽取学习,进而采用半监督学习与微调学习相结合的方式从有标注数据集向未标注数据集开展领域自适应.在来自京东与携程共5 个领域的真实电商数据集上进行实验,分别研究了领域特征词选择方法及其词性约束对领域间适应能力的影响,结果表明:相较于不采用领域适应的模型,可提升平均2.7%的准确率;且在来自亚马逊电商的公开数据集实验中,通过与现有方法进行对比,验证了该方法的有效性.\r
\r
Abstract_English:\rSentiment classification, which extracts the opinions from sentences/documents,has been extensively studied. Most of the conventional sentiment classification models require a lot of cost to obtain the labeled data. In order to solve the problem that a trained classifier from other domain cannot be used directly on the target domain which lack labeled data,we proposed a novel domain adaptation model with reconstructing a common feature representation. This model makes the classifier from the labeled domain adapt to the unlabeled domain,reduces the cost of manual labeling and achieves the domain adaptation of sentiment classification. This model utilizes the pre-trained word vectors as the feature of the words. With the premise that the syntactic structure used to express sentiment in the same language is similar,a common feature space shared by the labeled and unlabeled data set is reconstructed by replacing the special domain words that unique to the domain. Therefore,the information sharing between the labeled and unlabeled data sets is realized. Based on this,the convolutional neural network in the model uses different size of convolution kernels to extract the context features of different range of words. With semi-supervised learning and fine tuning learning,the model can be domain adapted from the labeled domain to the unlabeled domain. In experiments based on real data from Jingdong and Xiecheng, we separately compared the influence of different domain words selection and different POS constraints on the performance of our model,and found our model can improve the accuracy by about 2.7% compared to our baseline. In addition,we compared our model with related works on the public data from Amazon,and verified the effectiveness of our model.\r
\r
Keyword_Chinese:情感分类;领域自适应;半监督学习;特征重构\r

Keywords_English:sentiment classification;domain adaptation;semi-supervised learning;feature reconstructing\r


PDF全文下载地址:http://xbzrb.tju.edu.cn/#/digest?ArticleID=6248
相关话题/空间 公共