删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于改进编辑距离和LCS的同源性检测技术

本站小编 Free考研考试/2021-12-21

本文二维码信息
二维码(扫一下试试看!)
基于改进编辑距离和LCS的同源性检测技术
A Homology Detection Technology Based on Improved Edit Distance and LCS
投稿时间:2014-04-21
DOI:10.15918/j.tbit1001-0645.2017.02.011
中文关键词:同源性检测编辑距离最长公共字串结构化信息代码变体
English Keywords:homology detectionedit distancelongest common sequencestructured informationcode variants
基金项目:电子信息产业发展基金资助项目(工信部财函[2011]506号)
作者单位
刘云龙工业和信息化部 计算机与微电子发展研究中心, 北京 100048
摘要点击次数:955
全文下载次数:662
中文摘要:
传统基于Token的同源性检测算法存在代码变体结构化信息定位困难、模块提取、识别能力差、同源性度量精度低的问题.为此,提出了一种基于改进编辑距离和LCS(longest common sequence)的结构化识别同源性检测技术.在编辑距离(edit distance)计算中,引入交换算子,提高模块内部同源性度量精度.在LCS算法中,引入相似模块度量的最小尺寸监测机制和代码行间最大动态相关性度量,提供代码结构边界划分、模块行关联、代码有效结构化信息抽取的能力.实验证明,该方法是一种有效的基于结构化信息的同源性检测技术,其随机抽样检测结果的准确率、召回率及F值均有较优表现,且稳定性较好.
English Summary:
Because some problems existed in traditional token-based algorithm for homology detection in structured information location, module identification, module extraction and high precision homology measure for code variants, a structured recognition homology detection technology was proposed based on an improved edit distance algorithm and improved longest common sequence (LCS) algorithm. In the edit distance calculation, the exchange operator was introduced to improve the measurement accuracy of internal homology modules. In the LCS algorithm, a minimum size monitoring mechanism and line maximum dynamic correlation measure were introduced for similar modules, which offered the ability of code structure boundary division, module line association and structured information extraction. Experiments show that the structure information based algorithm is effective and stable for code homology detection, and the results of random sampling detection show its better performances in precision, recall rate and F values. Experiments show that the algorithm utilizing structure information for code homology detection is effective and stable, and the results of random sampling detection have better performances in precision, recall rate and F values.
查看全文查看/发表评论下载PDF阅读器
相关话题/信息 代码 微电子 计算机 中文