删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

融入BERT的企业年报命名实体识别方法

本站小编 Free考研考试/2022-02-12

张靖宜1, 贺光辉1(), 代洲2, 刘亚东1
1.上海交通大学 电子信息与电气工程学院,上海  200240
2.南方电网物资有限公司,广州  510641
收稿日期:2020-01-08出版日期:2021-02-01发布日期:2021-03-03
通讯作者:贺光辉E-mail:guanghui.he@sjtu.edu.cn
作者简介:张靖宜(1996-),女,河南省南阳市人,硕士生,主要从事自然语言处理的研究.


Named Entity Recognition of Enterprise Annual Report Integrated with BERT

ZHANG Jingyi1, HE Guanghui1(), DAI Zhou2, LIU Yadong1
1.School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
2.China Southern Power Grid Materials Co. , Ltd. , Guangzhou 510641, China
Received:2020-01-08Online:2021-02-01Published:2021-03-03
Contact:HE Guanghui E-mail:guanghui.he@sjtu.edu.cn






摘要/Abstract


摘要: 自动提取企业年报关键数据是企业评价工作自动化的重要手段.针对企业年报领域关键实体结构复杂、与上下文语义关联强、规模较小的特点,提出基于转换器的双向编码器表示-双向门控循环单元-注意力机制-条件随机场(BERT-BiGRU-Attention-CRF)模型.在BiGRU-CRF模型的基础上,首先引入BERT预训练语言模型,以增强词向量模型的泛化能力,捕捉长距离的上下文信息;然后引入注意力机制,以充分挖掘文本的全局和局部特征.在自行构建的企业年报语料库内进行实验,将该模型与多组传统模型进行对比.结果表明:该模型的F1值(精确率和召回率的调和平均数)为93.69%,对企业年报命名实体识别性能优于其他传统模型,有望成为企业评价工作自动化的有效方法.
关键词: 命名实体识别, 企业年报, BERT, 注意力机制, 双向门控循环单元
Abstract: Automatically extracting key data from annual reports is an important means of business assessments. Aimed at the characteristics of complex entities, strong contextual semantics, and small scale of key entities in the field of corporate annual reports, a BERT-BiGRU-Attention-CRF model was proposed to automatically identify and extract entities in the annual reports of enterprises. Based on the BiGRU-CRF model, the BERT pre-trained language model was used to enhance the generalization ability of the word vector model to capture long-range contextual information. Furthermore, the attention mechanism was used to fully mine the global and local features of the text. The experiment was performed on a self-constructed corporate annual report corpus, and the model was compared with multiple sets of models. The results show that the value of F1 (harmonic mean of precision and recall) of the BERT-BiGRU-Attention-CRF model is 93.69%. The model has a better performance than other traditional models in annual reports, and is expected to provide an automatic means for enterprise assessments.
Key words: named entity recognition, enterprise annual report, BERT, attention mechanism, BiGRU


PDF全文下载地址:

点我下载PDF
闁归潧顑嗗┃鈧煫鍥跺亰閳ь剛鍠庨崢銈囨嫻鐟欏嫭鏆堥柛鎰缁变即宕ㄩ敓锟�闁挎稑鐬奸悵娑㈠础鐎圭姴绠柛娆愮墬濠€鎵博濞嗘帞銈柡鍌涚懆琚欓柛妯侯儑缂傚鈧潧妫涢悥婊堟晬閿燂拷
相关话题/工作 自动化 上海交通大学 信息 电气工程学院

闁瑰瓨鍔掔拹鐔烘嫚閸欍儱鏁╅悶娑辩厜缁辨繈宕氶崱鏇㈢叐閻犲洤澧介埢鑲╂導閸曨剚鐏愰梺鍓у亾鐢浜告潏顐㈠幋闁兼儳鍢茶ぐ锟�40%闁圭粯鍔栭崹姘辨導濮樿埖灏柨娑虫嫹
闁规亽鍔岀粻宥囨導濮樿埖灏柡澶婂暟濞夘參濡撮崒婵愬殾濞寸媴缍€閵嗗啴宕i鐐╁亾濮樺磭绠栧ù婊勫笩娴犲牏绱旈幋鐘垫惣闂侇偅鏌ㄧ欢鐐寸▔閻戞ɑ鎷辩紒鏃€鐟︾敮褰掔嵁閸噮鍚呭ù鑲╁Л閳ь剚閽扞P濞村吋鑹鹃幉鎶藉灳濠垫挾绀夐柣鈧妽閸╂盯鏌呭宕囩畺閻犲洤褰為崬顒傛偘閵娧勭暠闁告帒妫旈棅鈺呮煣閻愵剙澶嶉柟瀛樼墬閹癸綁骞庨妷銊ユ灎濞戞梹婢橀幃妤呮晬瀹€鍐惧殾濞寸媴缍€閵嗗啴鎳㈠畡鏉跨悼40%闁圭粯鍔栭崹姘跺Υ閸屾繍鍤﹀ù鐙呯秬閵嗗啰鎷归婵囧闁哄牜鍓涢悵顖涚鐠佸磭绉垮ù婧犲啯鎯傞柨娑樿嫰濞煎孩绂嶉銏犵秬9闁硅埖菧閳ь剙鍊搁惃銏ゅ礆閸℃洟鐓╅梺鍓у亾鐢挳濡存担瑙勫闯闁硅翰鍎卞ù姗€鎮ч崶鈺冩惣闁挎稑鑻ぐ鍌炲礆閺夋鍔呴柡宓氥値鍟堥柛褎绋忛埀顑胯兌濞呫劍鎯旈敃浣稿灡闁告皜浣插亾娴i晲绨抽柛妤佸搸閳ь兛绀佹禍鏇熺┍鎺抽埀顑垮倕Q缂佸本妞藉Λ鍧楀Υ娴h櫣鍙€濞戞柨绨洪埀顑挎祰閻挳鎮洪敐鍥╂惣闁告艾瀚妵鍥嵁閸愭彃閰遍柕鍡嫹