王颖2,
钱力2, 3,
王颖1
1.中国科学院大学人工智能学院 北京 100049
2.中国科学院文献情报中心 北京 100190
3.中国科学院大学图书情报与档案管理系 北京 100190
基金项目:国家自然科学基金(61702038),国家社会科学基金(15CTQ006)
详细信息
作者简介:盛晓光:男,1989年生,博士生,研究方向为教育数据挖掘、人工智能
王颖:女,1982年生,副研究馆员,研究方向为知识组织与知识挖掘
钱力:男,1981年生,研究馆员,研究方向为大数据与机器智能
王颖:女,1969年生,教授,研究方向为数字信号处理、教育数据挖掘
通讯作者:盛晓光 shengxiaoguang@ucas.ac.cn
1) 中图分类号:TP391.1
计量
文章访问数:60
HTML全文浏览量:36
PDF下载量:16
被引次数:0
出版历程
收稿日期:2020-10-23
录用日期:2021-11-04
修回日期:2021-09-23
网络出版日期:2021-11-10
刊出日期:2021-12-21
Author Name Disambiguation Based on Semi-supervised Learning with Graph Convolutional Network
Xiaoguang SHENG1,,,Ying WANG2,
Li QIAN2, 3,
Ying WANG1
1. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
2. National Science Library, Chinese Academy of Sciences, Beijing 100190, China
3. Department of Library, Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China
Funds:The National Natural Science Foundation of China (61702038), The National Social Science Foundation of China (15CTQ006)
摘要
摘要:为解决****与成果的精确匹配问题,该文提出了一种基于图卷积半监督学习的论文作者同名消歧方法。该方法使用SciBERT预训练语言模型计算论文题目、关键字获得论文节点语义表示向量,利用论文的作者和机构信息获得论文的合作网络和机构关联网络邻接矩阵,并从论文合作网络中采集伪标签获得正样本集和负样本集,将这些作为输入利用图卷积神经网络进行半监督学习,获得论文节点嵌入表示进行论文节点向量聚类,实现对论文作者同名消歧。实验结果表明,与其他消歧方法相比,该方法在实验数据集上取得了更好的效果。
关键词:同名消歧/
图卷积神经网络/
BERT语言模型
Abstract:In order to solve the problem of exact matching between scholars and articles, a new method of author name disambiguation is proposed based on semi-supervised learning with graph convolutional network. In this method, the SciBERT pre-training language model is applied to calculating the semantic embedding vector of each paper with their title and keywords. Authors and organizations of papers are used to obtain the adjacency matrixes of the paper’s co-author network and co-organization network. The pseudo labels are collected from the co-author network to obtain the positive and negative samples. The semantic embedding vector, adjacency matrixes and the positive and negative samples are used as input to be processed by Graph Convolution neural Network (GCN). In semi-supervised learning, the embedding vectors of papers are learned to be clustered in order to realize the name disambiguation of papers. The experimental results show that, compared with other disambiguation methods, this method achieves better results on the experimental dataset.
Key words:Name disambiguation/
Graph Convolutional Network (GCN)/
BERT language model
注释:
1) 1)
PDF全文下载地址:
https://jeit.ac.cn/article/exportPdf?id=284575fe-32bc-4db1-972a-802c0e8ba557