刘丹1, 2,,,
姚立霜1, 2,
王云锋1,
裴作飞1, 2
1.重庆邮电大学通信与信息工程学院 重庆 400065
2.移动通信技术重庆市重点实验室 重庆 400065
基金项目:****和创新团队发展计划(IRT_16R72)
详细信息
作者简介:唐宏:男,1967年生,教授,研究方向为计算机网络、移动通信
刘丹:女,1995年生,硕士生,研究方向为网络管理、机器学习
姚立霜:女,1993年生,硕士生,研究方向为网络管理、机器学习
王云锋:男,1992年生,硕士生,研究方向为机器学习、数据挖掘
裴作飞:男,1994年生,硕士生,研究方向为机器学习、数据挖掘
通讯作者:刘丹 s170101113@stu.cqupt.edu.cn
中图分类号:TP393计量
文章访问数:436
HTML全文浏览量:130
PDF下载量:45
被引次数:0
出版历程
收稿日期:2019-12-11
修回日期:2021-02-22
网络出版日期:2021-03-04
刊出日期:2021-04-20
Feature Selection Algorithm for Class Imbalanced Internet Traffic
Hong TANG1, 2,Dan LIU1, 2,,,
LiShuang YAO1, 2,
Yunfeng WANG1,
Zuofei PEI1, 2
1. School of Communication and Information Engineering, Chongqing University of Posts and Communications, Chongqing 400065, China
2. Key Laboratory of Mobile Communications Technology, Chongqing University of Posts and Communications, Chongqing 400065, China
Funds:Changjiang Scholars and Innovative Research Team in University (IRT_16R72)
摘要
摘要:针对网络流量分类过程中出现的类不平衡问题,该文提出一种基于加权对称不确定性(WSU)和近似马尔科夫毯(AMB)的特征选择算法。首先,根据类别分布信息,定义了偏向于小类别的特征度量,使得与小类别具有强相关性的特征更容易被选择出来;其次,充分考虑特征与类别间、特征与特征之间的相关性,利用加权对称不确定性和近似马尔科夫毯删除不相关特征及冗余特征;最后,利用基于相关性度量的特征评估函数以及序列搜索算法进一步降低特征维数,确定最优特征子集。实验表明,在保证算法整体分类精确率的前提下,算法能够有效提高小类别的分类性能。
关键词:流量分类/
特征选择/
类不平衡/
加权对称不确定性/
近似马尔科夫毯
Abstract:Class imbalance always exists in the process of network traffic classification. Considering the problem, a new feature selection algorithm using Weighted Symmetric Uncertainty (WSU) and Approximate Markov Blanket (AMB) is proposed. Firstly, a feature metric is defined using category distribution information, which is biased to minority classes. This makes it easier pick out features which have strong correlation with minority classes. Then, considering the correlation between features and categories and between features and features, the weighted symmetry uncertainty and approximate Markov blanket are used to delete the unrelated features and redundant features. Finally, the feature dimension is further reduced to determine the optimal feature subset, by using feature evaluation functions based on correlation measures and sequence search algorithms. The experimental results demonstrate that the algorithm can effectively improve the classification performance of minority classes without sacrificing the accuracy of the overall classification.
Key words:Traffic classification/
Feature selection/
Class imbalance/
Weighted Symmetric Uncertainty (WSU)/
Approximate Markov Blanket (AMB)
PDF全文下载地址:
https://jeit.ac.cn/article/exportPdf?id=9108a0ef-6a29-4e0f-a715-4e3b052e382e