作者:张春祥,逄淑阳,高雪瑶
\n
Authors:ZHANG Chun-xiang,PANG Shu-yang,GAO Xue-yao
\n
摘要:摘要:为了提高生物医学缩略语的消歧准确率,提出了一种融合ACNN和 Bi-LSTM半监督缩略语消歧方法。以缩略语为中心,提取左右4个邻接词汇单元的词形信息、词性信息和语义信息作为消歧特征。使用Xgboost算法和LightGBM算法扩充训练语料,将扩充完的训练语料输入到这个模型中,使用非对称卷积神经网络(asymmetric convolutional neural networks, ACNN)和双向长短期记忆网络(bidirectional long short-term memory, Bi-LSTM)来提取特征,使用softmax函数进行语义分类。使用MSH语料来优化该模型并测试其消歧性能,实验结果表明:本文所提出模型只需使用少量的有标注语料,可以有效的提高缩略语消歧准确率。
Abstract:Abstract: In order to improve disambiguation accuracy of biomedical abbreviations, a semi-supervised abbreviation disambiguation method based on asymmetric convolutional neural networks and bidirectional long short term memory networks is proposed. Abbreviation is viewed as center. Morphology information, part of speech and semantic information from four adjacent lexical units are extracted as disambiguation features. Training corpus is extended by using Xgboost algorithm and LightGBM algorithm, and then expanded training corpus is input into this model. Asymmetric convolutional neural networks (ACNN) and bidirectional long short-term memory (Bi-LSTM) networks are utilized to extract features. Softmax function is applied to semantic classification. MSH corpus is adopted to optimize this model and test its disambiguation performance. Experimental results show that the proposed model can effectively improve disambiguation accuracy of abbreviations by using only a small amount of annotated corpus.
PDF全文下载地址:
可免费Download/下载PDF全文
删除或更新信息,请邮件至freekaoyan#163.com(#换成@)