面向大规模裁判文书结构化的文本分类算法

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

本站小编 Free考研考试/2022-01-16

翁洋 ,谷松原 ,李静 ,王枫 ,李俊良,李鑫
AuthorsHTML:翁洋¹ ，谷松原 ² ，李静 ¹，王枫 ³ ，李俊良³，李鑫 ²
AuthorsListE:Weng Yang,Gu Songyuan,Li Jing,Wang Feng,Li Junliang,Li Xin
AuthorsHTMLE:Weng Yang¹，Gu Songyuan²，Li Jing¹，Wang Feng³，Li Junliang³，Li Xin²
Unit:1. 四川大学数学学院，成都 610064；
2. 四川大学法学院，成都 610207；
3. 数之联科技有限公司，成都 610041

Unit_EngLish:1. College of Mathematics，Sichuan University，Chengdu 610064，China；
2. Law School，Sichuan University，Chengdu 610207，China；
3. Union Big Data Technology Co.，Ltd.，Chengdu 610041，China

Abstract_Chinese:大数据和人工智能作为国家战略，使得新技术在司法领域应用的重要性凸显．同时，最高人民法院推动人工智能在司法领域的深度应用为相关研究提供了契机．最高人民法院主导的信息化建设以及司法公开等需求使得大量的裁判文书上网，裁判文书作为重要的法律文本信息资源，包含大量关键的案件审判信息，具有多元化的研究与应用价值．然而，裁判文书中存在着大量非结构化信息，妨碍了信息的准确抽取．对裁判文书进行结构化处理是基于裁判文书开展研究的重要前提．海量的裁判文书上网，人工处理将耗费大量的时间和精力，而裁判文书规范化改革为人工智能的司法应用提供基础．针对裁判文书结构化任务，已有的正则匹配方法或者基于文本分类模型的研究方法，未能利用文书上下文段落标签的结构特征，结构化效果较差．针对这一问题，提出了一种基于裁判文书段落级别的上下文语义特征信息的序列标注模型方法．通过学习完整的裁判文书中段落标签的结构信息、段落上下文之间的联系，实现良好的裁判文书结构化效果．结果表明：准确率、召回率和 F1 值较文本分类的基线模型有了全面提高，得到了几乎完全准确的分类效果．另外，本文采取的结构化方法核心在于利用裁判文书段落级别的上下文语义特征信息，该方法可以推广到各种类型的裁判文书的结构化任务．
Abstract_English:As a national strategy，big data and artificial intelligence(AI)are driving the application of new technologies in the judicial field. The Supreme People’s Court is also promoting the application of AI in the judicial system， which provides an opportunity for related research. The demand for information frameworks and the judicial openness by the Supreme People’s Court have brought a large number of judgments online. As an important legal text information resource，these judgments contain a large volume of key trial information with a diverse range of research and application values. However，there is also a large amount of unstructured information in the judgments that prevents the efficient and accurate extraction of information. Structural processing is an important prerequisite for any research based on these judgments. Massive numbers of judgments are uploaded to the internet，and their manual processing would consume much time and energy. A standardized reform of judgments would provide a basis for the applicationof AI to the judicial system. In the structuring of judgments，existing matching and research methods based on text classification models fail to take advantage of the structural features of the paragraph tags regarding the context of the document，which yield poor structuring results. To solve this problem，we propose a sequential labeling model method based on contextual semantic feature information at the paragraph level of the judgments. By studying the structural information of the paragraph labels in complete judgments and the relationship between the paragraph contexts，a good structuring of the judgments is achieved. The results show that the accuracy rate，recall rate，and F1 value are significantly improved compared to the results obtained by the baseline model of text classification，with almost completely accurate classification results obtained. In addition，as the proposed method utilizes contextual semantic information at the paragraph level of judgment text，this information can be extended to various types of judgment text structuring tasks.
Keyword_Chinese:裁判文书；文本结构化；预训练模型
Keywords_English:judgment texts；text structuring；pre-training model

PDF全文下载地址:http://xbzrb.tju.edu.cn/#/digest?ArticleID=6620

面向大规模裁判文书结构化的文本分类算法

本站小编 Free考研考试/2022-01-16

相关话题/算法 裁判

领限时大额优惠券,享本站正版考研考试资料!

面向智能碾压机的位姿感知算法

基于改进 SVR 算法的灌浆功率阈值预测方法研究

无人碾压机轨迹跟踪算法及能耗规律研究

海上机动目标的天基观测体系的观测预判算法

基于稀疏贝叶斯-RNAMBO 算法的低剂量 CT 盲复原方法

基于动态一致性算法的光伏-储能分布式协调电压控制

一种基于MRF的快速图像修复算法\r\n\t\t

基于单目视觉的高速并联机器人动态目标跟踪算法\r\n\t\t

基于级联卷积神经网络的服饰关键点定位算法\r\n\t\t

一种基于YOLOv3 的汽车底部危险目标检测算法\r\n\t\t