| 基于离线汇编指令流分析的恶意程序算法识别技术 | 
| 赵晶玲1,2, 陈石磊1,2, 曹梦晨1,2, 崔宝江1,2 | 
| 1. 北京邮电大学 计算机学院, 北京 100876; 2. 移动互联网安全技术国家工程实验室, 北京 100876 | 
| Malware algorithm recognition based on offline instruction-flow analyse | 
| ZHAO Jingling1,2, CHEN Shilei1,2, CAO Mengchen1,2, CUI Baojiang1,2 | 
| 1. School of Computer, Beijng University of Post and Telecommunications, Beijing 100876, China; 2. National Engineering Lab for Mobile Network Security, Beijing 100876, China | 
摘要:
| 
 | |||
| 摘要识别二进制程序中的算法, 在恶意程序检测、软件分析、网络传输分析、计算机系统安全保护等领域有着广泛的应用和重要的意义。该文提出基于离线汇编指令流分析的恶意代码算法识别技术, 综合运用二进制插桩、污点跟踪、循环识别等技术, 从行为语义、关键常数2个维度对程序进行描述, 并且分析提取特征。算法识别模型使用机器学习算法, 针对双维度特征生成初阶识别模型, 并通过模型融合优化识别效果, 实现对广义程序算法的高准确率识别。 | |||
| 关键词 :算法识别,污点跟踪,机器学习,恶意程序检测 | |||
| Abstract:Binary program algorithm identification is widely used for malware detection, software analyse, network encryption analyse and computer system protection. This paper describes a malware algorithm recognition method using offline instruction-flow analyses using binary instrumentation, taint traces, and loop recognition. The algorithm features are described including the behavior semantics and key constants extracted from the instruction-flow algorithm. Two machine learning models trained by these features are merged into one accurate recognition algorithm. | |||
| Key words:algorithm recognitiontaint tracemachine learningmalware detection | |||
| 收稿日期: 2016-01-24 出版日期: 2016-05-19 | |||
| 
 | |||
| 引用本文: | 
| 赵晶玲, 陈石磊, 曹梦晨, 崔宝江. 基于离线汇编指令流分析的恶意程序算法识别技术[J]. 清华大学学报(自然科学版), 2016, 65(5): 484-492. ZHAO Jingling, CHEN Shilei, CAO Mengchen, CUI Baojiang. Malware algorithm recognition based on offline instruction-flow analyse. Journal of Tsinghua University(Science and Technology), 2016, 65(5): 484-492. | 
| 链接本文: | 
| http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.25.005或 http://jst.tsinghuajournals.com/CN/Y2016/V65/I5/484 | 
图表:
|  | 
| 图1 离线分析框架流程 | 
|  | 
| 图2 循环结构控制流示意 | 
|  | 
| 表1 循环嵌套后向搜索算法 | 
|  | 
| 图3 算法识别模型融合过程示意 | 
|  | 
| 表2 行为语义轮廓初阶识别模型结果 | 
|  | 
| 表3 关键常数初阶识别模型结果 | 
|  | 
| 表4 仲裁模型的函数算法功能识别结果 | 
|  | 
| 图4 模型测试结果对比 | 
参考文献:
| [1] Vyacheslav Zakorzhevsk. 卡巴斯基实验室每天检测到32.5万个最新恶意文件[Z/OL].[2014-12-03] . http://news.kaspersky.com.cn/news2014/12n/141203.htm. Vyacheslav Zakorzhevsk. 325, 000 new malicious files detected by Kabasiji labs every day[Z/OL].[2014-12-03] . http://news.kaspersky.com.cn/news2014/12n/141203.htm. (in Chinese) [2] Calvet J, Fernandez J M, Marion J Y. Aligot:Cryptographic function identification in obfuscated binary programs[C]//Proceedings of the 2012 ACM Conference on Computer and Communications Security. New York, USA:ACM, 2012:169-182. [3] Leder F, Martini P, Wichmann A. Finding and extracting crypto routines from malware[C]//Performance Computing and Communications Conference (IPCCC), 2009 IEEE 28th International. Piscataway, NJ:IEEE Press, 2009:394-401. [4] Cui B, Wang F, HaoY, et al. A taint based approach for automatic reverse engineering of gray-box file formats[J].Soft Computing, 2015:1-16. [5] Wang Z, Jiang X, Cui W, et al. ReFormat:Automatic reverse engineering of encrypted messages[C]//Proceedings of the 14th European Conference on Research in Computer Security. Berlin, GER:Springer-Verlag, 2008:200-215. [6] Lutz N. Towards revealing attackers intent by automatically decrypting network traffic[J]. Eth Zuerich, 2008(8):1-52. [7] 李继中, 蒋烈辉, 舒辉, 等. 基于动态数据流的密码函数加解密过程分析[J]. 计算机应用研究, 2014,31(4):1185-1188. LI Jizhong, JIANG Liehui, SHU Hui, et al. Analysis of encryption and decryption process among crypto functions based on dynamic data-flow[J].Application Research of Computer, 2014,31(4):1185-1188. (in Chinese) [8] Gr bert F, Willems C, Holz T. Automated identification of cryptographic primitives in binary programs[J].Lecture Notes in Computer Science, 2011,6961:41-60. [9] 张经纬, 舒辉, 蒋烈辉, 等. 公钥密码算法识别技术研究[J]. 计算机工程与设计, 2011,32(10):3243-3246. ZHANG Jingwei, SHU Hui, JIANG Liehui, et al. Research on public key's cryptography algorithm recognition technology[J].Computer Engineering and Desgin, 2011,32(10):3243-3246. (in Chinese) [10] 李洋, 康绯, 舒辉. 基于动态二进制分析的密码算法识别[J]. 计算机工程, 2012, 38(17):106-109. LI Yang, KANG Fei, SHU Hui. Cryptographic algorithm recognition based on dynamic binary analysis[J].Computer Engineering, 2012,38(17):106-109. (in Chinese) [11] Caballero J, Yin H, Liang Z, et al. Polyglot:Automatic extraction of protocol message format using dynamic binary analysis[C]//Proceedings of the 14th ACM Conference on Computer and Communications Security. New York, USA:ACM, 2007:317-329. [12] Cui B, Wang F, Guo T, et al. A practical off-line taint analysis framework and its application in reverse engineering of file format[J].Computers & Security, 2015,51:1-15. [13] 王乾. 基于动态二进制分析的关键函数定位技术研究[D]. 郑州:解放军信息工程大学, 2012. WANG Qian. Research on Locating of Key Functions Based on Dynamic Binary Analysis[D]. Zhengzhou:The PLA Information Engineering University, 2012. (in Chinese) [14] 黎超. 基于切片的二进制代码可视化分析的研究[D]. 广州:广东工业大学, 2011 LI Chao. Research on Slicing-based Binary Executables Analysis Technology[D]. Guangzhou:Guangdong University of Technology, 2012. (in Chinese) [15] 李雪莲. 基于PLS的加权朴素贝叶斯分类测试算法[J]. 电子质量, 2010(7):4-6. LI Xuelian. Weighted naive Bayes classification text algorithm based on partial least squares[J].Electronics Quality, 2010(7):4-6. (in Chinese) | 
相关文章:
| 
 | 
