删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于CNN与LSTM相结合的恶意域名检测模型

本站小编 Free考研考试/2022-01-03

张斌,
廖仁杰,
1.中国人民解放军战略支援部队信息工程大学 郑州 450001
2.河南省信息安全重点实验室 郑州 450001
基金项目:河南省基础与前沿技术研究计划基金(142300413201),信息保障技术重点实验室开放基金项目(KJ-15-109),信息工程大学科研项目(2019f3303)

详细信息
作者简介:张斌:男,1969年生,教授,博士生导师,研究方向为信息系统安全
廖仁杰:男,1996年生,硕士生,研究方向为基于机器学习的恶意域名检测
通讯作者:廖仁杰 lrj2803@163.com
中图分类号:TN915.08; TP393

计量

文章访问数:559
HTML全文浏览量:314
PDF下载量:103
被引次数:0
出版历程

收稿日期:2020-08-04
修回日期:2020-12-13
网络出版日期:2021-02-06
刊出日期:2021-10-18

Malicious Domain Name Detection Model Based on CNN and LSTM

Bin ZHANG,
Renjie LIAO,
1. PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China
2. Henan Key Laboratory of Information Security, Zhengzhou 450001, China
Funds:The Foundation and Frontier Technology Research Project of Henan Province (142300413201), The Open Fund Project of Information Assurance Technology Key Laboratory (KJ-15-109), The Research Project of Information Engineering University (2019f3303)


摘要
摘要:为提高恶意域名检测准确率,该文提出一种基于卷积神经网络(CNN)与长短期记忆网络(LSTM)相结合的域名检测模型。该模型通过提取域名字符串中不同长度字符组合的序列特征进行恶意域名检测:首先,为避免N-Gram特征稀疏分布的问题,采用CNN提取域名字符串中字符组合特征并转化为维度固定的稠密向量;其次,为充分挖掘域名字符串上下文信息,采用LSTM提取字符组合前后关联的深层次序列特征,同时引入注意力机制为填充字符所处位置的输出特征分配较小权重,降低填充字符对特征提取的干扰,增强对长距离序列特征的提取能力;最后,将CNN提取局部特征与LSTM提取序列特征的优势相结合,获得不同长度字符组合的序列特征进行域名检测。实验表明:该模型较单一采用CNN或LSTM的模型具有更高的召回率和F1分数,尤其对matsnu和suppobox两类恶意域名的检测准确率较单一采用LSTM的模型提高了24.8%和3.77%。
关键词:恶意域名/
卷积神经网络/
长短期记忆网络/
注意力机制
Abstract:To improve the accuracy of malicious domain name detection, a new detection model based on Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) is proposed. The model extracts the sequence features from different length strings to classify the domain name. Firstly, in view of the sparseness of the N-Gram feature, the model utilizes CNN with different kernels to preserve the local association between the characters in the domain name strings and convert it to dense feature vectors. Secondly, in order to mine the context information of the domain name strings, LSTM is used to extract the deep-level sequence features of different character combinations. A sequence feature attention module is designed to assign little weight value to the sequence feature extracted from the padding characters, which decreases the interference by the padding characters and enhances the ability to capture distant sequence features. Finally, combining the advantages of CNN to extract local features and LSTM to extract sequence features, both partial and sequential information are put forward to improving the detection performance. Experimental results show that the recall rate and the F1-score of the proposed model are superior to other comparative models which are solely composed of CNN or LSTM. Particularly, when dealing with the matsnu and suppobox, the proposed model has increased by 24.8% and 3.77% in accuracy compared with the model based on LSTM, respectively.
Key words:Malicious domain name/
Convolutional Neural Network (CNN)/
Long Short Term Memory (LSTM)/
Attention mechanism



PDF全文下载地址:

https://jeit.ac.cn/article/exportPdf?id=b25667c7-c1b9-4023-98b4-343d2f22c079
相关话题/序列 网络 信息工程 信息 分数