删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

面向大规模裁判文书结构化的文本分类算法

本站小编 Free考研考试/2022-01-16

翁 洋 ,谷松原 ,李 静 ,王 枫 ,李俊良,李 鑫
AuthorsHTML:翁 洋 1 ,谷松原 2 ,李 静 1 ,王 枫 3 ,李俊良 3 ,李 鑫 2
AuthorsListE:Weng Yang,Gu Songyuan,Li Jing,Wang Feng,Li Junliang,Li Xin
AuthorsHTMLE:Weng Yang1,Gu Songyuan2,Li Jing1,Wang Feng3,Li Junliang3,Li Xin2
Unit:1. 四川大学数学学院,成都 610064;
2. 四川大学法学院,成都 610207;
3. 数之联科技有限公司,成都 610041

Unit_EngLish:1. College of Mathematics,Sichuan University,Chengdu 610064,China;
2. Law School,Sichuan University,Chengdu 610207,China;
3. Union Big Data Technology Co.,Ltd.,Chengdu 610041,China

Abstract_Chinese:大数据和人工智能作为国家战略,使得新技术在司法领域应用的重要性凸显.同时,最高人民法院推动人 工智能在司法领域的深度应用为相关研究提供了契机.最高人民法院主导的信息化建设以及司法公开等需求使得大 量的裁判文书上网,裁判文书作为重要的法律文本信息资源,包含大量关键的案件审判信息,具有多元化的研究与 应用价值.然而,裁判文书中存在着大量非结构化信息,妨碍了信息的准确抽取.对裁判文书进行结构化处理是基于 裁判文书开展研究的重要前提.海量的裁判文书上网,人工处理将耗费大量的时间和精力,而裁判文书规范化改革 为人工智能的司法应用提供基础.针对裁判文书结构化任务,已有的正则匹配方法或者基于文本分类模型的研究方 法,未能利用文书上下文段落标签的结构特征,结构化效果较差.针对这一问题,提出了一种基于裁判文书段落级 别的上下文语义特征信息的序列标注模型方法.通过学习完整的裁判文书中段落标签的结构信息、段落上下文之间 的联系,实现良好的裁判文书结构化效果.结果表明:准确率、召回率和 F1 值较文本分类的基线模型有了全面提 高,得到了几乎完全准确的分类效果.另外,本文采取的结构化方法核心在于利用裁判文书段落级别的上下文语义 特征信息,该方法可以推广到各种类型的裁判文书的结构化任务.
Abstract_English:As a national strategy,big data and artificial intelligence(AI)are driving the application of new technologies in the judicial field. The Supreme People’s Court is also promoting the application of AI in the judicial system, which provides an opportunity for related research. The demand for information frameworks and the judicial openness by the Supreme People’s Court have brought a large number of judgments online. As an important legal text information resource,these judgments contain a large volume of key trial information with a diverse range of research and application values. However,there is also a large amount of unstructured information in the judgments that prevents the efficient and accurate extraction of information. Structural processing is an important prerequisite for any research based on these judgments. Massive numbers of judgments are uploaded to the internet,and their manual processing would consume much time and energy. A standardized reform of judgments would provide a basis for the applicationof AI to the judicial system. In the structuring of judgments,existing matching and research methods based on text classification models fail to take advantage of the structural features of the paragraph tags regarding the context of the document,which yield poor structuring results. To solve this problem,we propose a sequential labeling model method based on contextual semantic feature information at the paragraph level of the judgments. By studying the structural information of the paragraph labels in complete judgments and the relationship between the paragraph contexts,a good structuring of the judgments is achieved. The results show that the accuracy rate,recall rate,and F1 value are significantly improved compared to the results obtained by the baseline model of text classification,with almost completely accurate classification results obtained. In addition,as the proposed method utilizes contextual semantic information at the paragraph level of judgment text,this information can be extended to various types of judgment text structuring tasks.
Keyword_Chinese:裁判文书;文本结构化;预训练模型
Keywords_English:judgment texts;text structuring;pre-training model

PDF全文下载地址:http://xbzrb.tju.edu.cn/#/digest?ArticleID=6620
相关话题/算法 裁判