基于离线汇编指令流分析的恶意程序算法识别技术

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

清华大学辅仁网/2017-07-07

赵晶玲^1,2, 陈石磊^1,2, 曹梦晨^1,2, 崔宝江^1,2

1. 北京邮电大学计算机学院, 北京 100876;
2. 移动互联网安全技术国家工程实验室, 北京 100876

Malware algorithm recognition based on offline instruction-flow analyse

ZHAO Jingling^1,2, CHEN Shilei^1,2, CAO Mengchen^1,2, CUI Baojiang^1,2

1. School of Computer, Beijng University of Post and Telecommunications, Beijing 100876, China;
2. National Engineering Lab for Mobile Network Security, Beijing 100876, China

摘要:

输出: BibTeX | EndNote (RIS)

摘要识别二进制程序中的算法, 在恶意程序检测、软件分析、网络传输分析、计算机系统安全保护等领域有着广泛的应用和重要的意义。该文提出基于离线汇编指令流分析的恶意代码算法识别技术, 综合运用二进制插桩、污点跟踪、循环识别等技术, 从行为语义、关键常数2个维度对程序进行描述, 并且分析提取特征。算法识别模型使用机器学习算法, 针对双维度特征生成初阶识别模型, 并通过模型融合优化识别效果, 实现对广义程序算法的高准确率识别。

关键词 ：算法识别,污点跟踪,机器学习,恶意程序检测

Abstract：Binary program algorithm identification is widely used for malware detection, software analyse, network encryption analyse and computer system protection. This paper describes a malware algorithm recognition method using offline instruction-flow analyses using binary instrumentation, taint traces, and loop recognition. The algorithm features are described including the behavior semantics and key constants extracted from the instruction-flow algorithm. Two machine learning models trained by these features are merged into one accurate recognition algorithm.

Key words：algorithm recognition taint trace machine learning malware detection

收稿日期: 2016-01-24 出版日期: 2016-05-19

ZTFLH:

TP301.6

引用本文:

赵晶玲, 陈石磊, 曹梦晨, 崔宝江. 基于离线汇编指令流分析的恶意程序算法识别技术[J]. 清华大学学报（自然科学版）, 2016, 65(5): 484-492.
ZHAO Jingling, CHEN Shilei, CAO Mengchen, CUI Baojiang. Malware algorithm recognition based on offline instruction-flow analyse. Journal of Tsinghua University(Science and Technology), 2016, 65(5): 484-492.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.25.005或 http://jst.tsinghuajournals.com/CN/Y2016/V65/I5/484

图表:

图１　离线分析框架流程

图２　循环结构控制流示意

表１　循环嵌套后向搜索算法

图３　算法识别模型融合过程示意

表２　行为语义轮廓初阶识别模型结果

表３　关键常数初阶识别模型结果

表４　仲裁模型的函数算法功能识别结果

图４　模型测试结果对比

参考文献:

[1] Vyacheslav Zakorzhevsk. 卡巴斯基实验室每天检测到32.5万个最新恶意文件[Z/OL].[2014-12-03] . http://news.kaspersky.com.cn/news2014/12n/141203.htm. Vyacheslav Zakorzhevsk. 325, 000 new malicious files detected by Kabasiji labs every day[Z/OL].[2014-12-03] . http://news.kaspersky.com.cn/news2014/12n/141203.htm. (in Chinese)
[2] Calvet J, Fernandez J M, Marion J Y. Aligot:Cryptographic function identification in obfuscated binary programs[C]//Proceedings of the 2012 ACM Conference on Computer and Communications Security. New York, USA:ACM, 2012:169-182.
[3] Leder F, Martini P, Wichmann A. Finding and extracting crypto routines from malware[C]//Performance Computing and Communications Conference (IPCCC), 2009 IEEE 28th International. Piscataway, NJ:IEEE Press, 2009:394-401.
[4] Cui B, Wang F, HaoY, et al. A taint based approach for automatic reverse engineering of gray-box file formats[J].Soft Computing, 2015:1-16.
[5] Wang Z, Jiang X, Cui W, et al. ReFormat:Automatic reverse engineering of encrypted messages[C]//Proceedings of the 14th European Conference on Research in Computer Security. Berlin, GER:Springer-Verlag, 2008:200-215.
[6] Lutz N. Towards revealing attackers intent by automatically decrypting network traffic[J]. Eth Zuerich, 2008(8):1-52.
[7] 李继中, 蒋烈辉, 舒辉, 等. 基于动态数据流的密码函数加解密过程分析[J]. 计算机应用研究, 2014,31(4):1185-1188. LI Jizhong, JIANG Liehui, SHU Hui, et al. Analysis of encryption and decryption process among crypto functions based on dynamic data-flow[J].Application Research of Computer, 2014,31(4):1185-1188. (in Chinese)
[8] Gr bert F, Willems C, Holz T. Automated identification of cryptographic primitives in binary programs[J].Lecture Notes in Computer Science, 2011,6961:41-60.
[9] 张经纬, 舒辉, 蒋烈辉, 等. 公钥密码算法识别技术研究[J]. 计算机工程与设计, 2011,32(10):3243-3246. ZHANG Jingwei, SHU Hui, JIANG Liehui, et al. Research on public key's cryptography algorithm recognition technology[J].Computer Engineering and Desgin, 2011,32(10):3243-3246. (in Chinese)
[10] 李洋, 康绯, 舒辉. 基于动态二进制分析的密码算法识别[J]. 计算机工程, 2012, 38(17):106-109. LI Yang, KANG Fei, SHU Hui. Cryptographic algorithm recognition based on dynamic binary analysis[J].Computer Engineering, 2012,38(17):106-109. (in Chinese)
[11] Caballero J, Yin H, Liang Z, et al. Polyglot:Automatic extraction of protocol message format using dynamic binary analysis[C]//Proceedings of the 14th ACM Conference on Computer and Communications Security. New York, USA:ACM, 2007:317-329.
[12] Cui B, Wang F, Guo T, et al. A practical off-line taint analysis framework and its application in reverse engineering of file format[J].Computers & Security, 2015,51:1-15.
[13] 王乾. 基于动态二进制分析的关键函数定位技术研究[D]. 郑州:解放军信息工程大学, 2012. WANG Qian. Research on Locating of Key Functions Based on Dynamic Binary Analysis[D]. Zhengzhou:The PLA Information Engineering University, 2012. (in Chinese)
[14] 黎超. 基于切片的二进制代码可视化分析的研究[D]. 广州:广东工业大学, 2011 LI Chao. Research on Slicing-based Binary Executables Analysis Technology[D]. Guangzhou:Guangdong University of Technology, 2012. (in Chinese)
[15] 李雪莲. 基于PLS的加权朴素贝叶斯分类测试算法[J]. 电子质量, 2010(7):4-6. LI Xuelian. Weighted naive Bayes classification text algorithm based on partial least squares[J].Electronics Quality, 2010(7):4-6. (in Chinese)

[1]	刘泽文, 丁冬, 李春文. 基于条件随机场的中文短文本分词方法[J]. 清华大学学报（自然科学版）, 2015, 55(8): 906-910,915.