删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

An Ensemble Approach for Record Matching in Data Linkage (2016)_香港中文大学

香港中文大学 辅仁网/2017-06-21

An Ensemble Approach for Record Matching in Data Linkage
Refereed conference paper presented and published in conference proceedings


香港中文大学研究人员 ( 现职)
刘玉麟教授 (内科及药物治疗学系)
陈锦良博士 (中医学院)
张海艺先生 (香港中西医结合医学研究所)
莫仲棠教授 (内科及药物治疗学系)
胡志远教授 (内科及药物治疗学系)


全文


引用次数
Web of Sciencehttp://aims.cuhk.edu.hk/converis/portal/Publication/0WOS source URL

其它资讯

摘要Objectives: To develop and test an optimal ensemble configuration of two complementary probabilistic data matching techniques namely Fellegi-Sunter (FS) and Jaro-Wrinkler (JW) with the goal of improving record matching accuracy. Methods: Experiments and comparative analyses were carried out to compare matching performance amongst the ensemble configurations combining FS and JW against the two techniques independently. Results: Our results show that an improvement can be achieved when FS technique is applied to the remaining unsure and unmatched records after the JW technique has been applied. Discussion: Whilst all data matching techniques rely on the quality of a diverse set of demographic data, FS technique focuses on the aggregating matching accuracy from a number of useful variables and JW looks closer into matching the data content (spelling in this case) of each field. Hence, these two techniques are shown to be complementary. In addition, the sequence of applying these two techniques is critical. Conclusion: We have demonstrated a useful ensemble approach that has potential to improve data matching accuracy, particularly when the number of demographic variables is limited. This ensemble technique is particularly useful when there are multiple acceptable spellings in the fields, such as names and addresses.

着者Poon SK, Poon J, Lam MK, Yin QL, Sze DMY, Wu JCY, Mok VCT, Ching JYL, Chan KL, Cheung WHN, Lau AY
会议名称24th Australian National Health Informatics Conference (HIC)
会议开始日http://aims.cuhk.edu.hk/converis/portal/Publication/01.http://aims.cuhk.edu.hk/converis/portal/Publication/01.2http://aims.cuhk.edu.hk/converis/portal/Publication/016
会议地点Melbourne
会议国家澳大利亚
出版年份2http://aims.cuhk.edu.hk/converis/portal/Publication/016
卷号227
出版社IOS PRESS
页次113 - 119
国际标準书号978-1-61499-665-1
电子国际标準书号978-1-61499-666-8
国际标準期刊号http://aims.cuhk.edu.hk/converis/portal/Publication/0926-963http://aims.cuhk.edu.hk/converis/portal/Publication/0
语言英式英语

关键词Data Linkage; Fellegi-Sunter; Jaro-Wrinkler; probabilistic data matching
Web of Science 学科类别Medical Informatics

相关话题/内科 国际 博士 电子 语言