删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于离线汇编指令流分析的恶意程序算法识别技术

清华大学 辅仁网/2017-07-07

基于离线汇编指令流分析的恶意程序算法识别技术
赵晶玲1,2, 陈石磊1,2, 曹梦晨1,2, 崔宝江1,2
1. 北京邮电大学 计算机学院, 北京 100876;
2. 移动互联网安全技术国家工程实验室, 北京 100876
Malware algorithm recognition based on offline instruction-flow analyse
ZHAO Jingling1,2, CHEN Shilei1,2, CAO Mengchen1,2, CUI Baojiang1,2
1. School of Computer, Beijng University of Post and Telecommunications, Beijing 100876, China;
2. National Engineering Lab for Mobile Network Security, Beijing 100876, China

摘要:

输出: BibTeX | EndNote (RIS)
摘要识别二进制程序中的算法, 在恶意程序检测、软件分析、网络传输分析、计算机系统安全保护等领域有着广泛的应用和重要的意义。该文提出基于离线汇编指令流分析的恶意代码算法识别技术, 综合运用二进制插桩、污点跟踪、循环识别等技术, 从行为语义、关键常数2个维度对程序进行描述, 并且分析提取特征。算法识别模型使用机器学习算法, 针对双维度特征生成初阶识别模型, 并通过模型融合优化识别效果, 实现对广义程序算法的高准确率识别。
关键词 算法识别,污点跟踪,机器学习,恶意程序检测
Abstract:Binary program algorithm identification is widely used for malware detection, software analyse, network encryption analyse and computer system protection. This paper describes a malware algorithm recognition method using offline instruction-flow analyses using binary instrumentation, taint traces, and loop recognition. The algorithm features are described including the behavior semantics and key constants extracted from the instruction-flow algorithm. Two machine learning models trained by these features are merged into one accurate recognition algorithm.
Key wordsalgorithm recognitiontaint tracemachine learningmalware detection
收稿日期: 2016-01-24 出版日期: 2016-05-19
ZTFLH:TP301.6
引用本文:
赵晶玲, 陈石磊, 曹梦晨, 崔宝江. 基于离线汇编指令流分析的恶意程序算法识别技术[J]. 清华大学学报(自然科学版), 2016, 65(5): 484-492.
ZHAO Jingling, CHEN Shilei, CAO Mengchen, CUI Baojiang. Malware algorithm recognition based on offline instruction-flow analyse. Journal of Tsinghua University(Science and Technology), 2016, 65(5): 484-492.
链接本文:
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.25.005 http://jst.tsinghuajournals.com/CN/Y2016/V65/I5/484


图表:
图1 离线分析框架流程
图2 循环结构控制流示意
表1 循环嵌套后向搜索算法
图3 算法识别模型融合过程示意
表2 行为语义轮廓初阶识别模型结果
表3 关键常数初阶识别模型结果
表4 仲裁模型的函数算法功能识别结果
图4 模型测试结果对比


参考文献:
[1] Vyacheslav Zakorzhevsk. 卡巴斯基实验室每天检测到32.5万个最新恶意文件[Z/OL].[2014-12-03] . http://news.kaspersky.com.cn/news2014/12n/141203.htm. Vyacheslav Zakorzhevsk. 325, 000 new malicious files detected by Kabasiji labs every day[Z/OL].[2014-12-03] . http://news.kaspersky.com.cn/news2014/12n/141203.htm. (in Chinese)
[2] Calvet J, Fernandez J M, Marion J Y. Aligot:Cryptographic function identification in obfuscated binary programs[C]//Proceedings of the 2012 ACM Conference on Computer and Communications Security. New York, USA:ACM, 2012:169-182.
[3] Leder F, Martini P, Wichmann A. Finding and extracting crypto routines from malware[C]//Performance Computing and Communications Conference (IPCCC), 2009 IEEE 28th International. Piscataway, NJ:IEEE Press, 2009:394-401.
[4] Cui B, Wang F, HaoY, et al. A taint based approach for automatic reverse engineering of gray-box file formats[J].Soft Computing, 2015:1-16.
[5] Wang Z, Jiang X, Cui W, et al. ReFormat:Automatic reverse engineering of encrypted messages[C]//Proceedings of the 14th European Conference on Research in Computer Security. Berlin, GER:Springer-Verlag, 2008:200-215.
[6] Lutz N. Towards revealing attackers intent by automatically decrypting network traffic[J]. Eth Zuerich, 2008(8):1-52.
[7] 李继中, 蒋烈辉, 舒辉, 等. 基于动态数据流的密码函数加解密过程分析[J]. 计算机应用研究, 2014,31(4):1185-1188. LI Jizhong, JIANG Liehui, SHU Hui, et al. Analysis of encryption and decryption process among crypto functions based on dynamic data-flow[J].Application Research of Computer, 2014,31(4):1185-1188. (in Chinese)
[8] Gr bert F, Willems C, Holz T. Automated identification of cryptographic primitives in binary programs[J].Lecture Notes in Computer Science, 2011,6961:41-60.
[9] 张经纬, 舒辉, 蒋烈辉, 等. 公钥密码算法识别技术研究[J]. 计算机工程与设计, 2011,32(10):3243-3246. ZHANG Jingwei, SHU Hui, JIANG Liehui, et al. Research on public key's cryptography algorithm recognition technology[J].Computer Engineering and Desgin, 2011,32(10):3243-3246. (in Chinese)
[10] 李洋, 康绯, 舒辉. 基于动态二进制分析的密码算法识别[J]. 计算机工程, 2012, 38(17):106-109. LI Yang, KANG Fei, SHU Hui. Cryptographic algorithm recognition based on dynamic binary analysis[J].Computer Engineering, 2012,38(17):106-109. (in Chinese)
[11] Caballero J, Yin H, Liang Z, et al. Polyglot:Automatic extraction of protocol message format using dynamic binary analysis[C]//Proceedings of the 14th ACM Conference on Computer and Communications Security. New York, USA:ACM, 2007:317-329.
[12] Cui B, Wang F, Guo T, et al. A practical off-line taint analysis framework and its application in reverse engineering of file format[J].Computers & Security, 2015,51:1-15.
[13] 王乾. 基于动态二进制分析的关键函数定位技术研究[D]. 郑州:解放军信息工程大学, 2012. WANG Qian. Research on Locating of Key Functions Based on Dynamic Binary Analysis[D]. Zhengzhou:The PLA Information Engineering University, 2012. (in Chinese)
[14] 黎超. 基于切片的二进制代码可视化分析的研究[D]. 广州:广东工业大学, 2011 LI Chao. Research on Slicing-based Binary Executables Analysis Technology[D]. Guangzhou:Guangdong University of Technology, 2012. (in Chinese)
[15] 李雪莲. 基于PLS的加权朴素贝叶斯分类测试算法[J]. 电子质量, 2010(7):4-6. LI Xuelian. Weighted naive Bayes classification text algorithm based on partial least squares[J].Electronics Quality, 2010(7):4-6. (in Chinese)


相关文章:
[1]刘泽文, 丁冬, 李春文. 基于条件随机场的中文短文本分词方法[J]. 清华大学学报(自然科学版), 2015, 55(8): 906-910,915.

相关话题/技术 工程 计算机 实验室 测试