陈小波3
1.江苏理工学院计算机工程学院 常州 213001
2.北京邮电大学可信分布式计算与服务教育部重点实验室 北京 100876
3.中国人民银行常州市中心支行 常州 213001
基金项目:国家科技基础性工作专项(2015FY111700-6),江苏理工学院博士科研基金(KYY19042)
详细信息
作者简介:张俐:男,1977年生,博士,副教授,研究方向为特征选择和机器学习
陈小波:女,1980年生,硕士,工程师,研究方向为特征选择和机器学习
通讯作者:张俐 zhangli_3913@163.com
中图分类号:TN911.7计量
文章访问数:425
HTML全文浏览量:214
PDF下载量:71
被引次数:0
出版历程
收稿日期:2020-07-23
修回日期:2021-02-05
网络出版日期:2021-03-19
刊出日期:2021-10-18
Feature Selection Algorithm for Dynamically Weighted Conditional Mutual Information
Li ZHANG1, 2,,,Xiaobo CHEN3
1. College of Computer Engineering, Jiangsu University of Technology, Changzhou 213001, China
2. Key Laboratory of Trustworthy Distributed Computing and Service (Ministry of Education), Beijing University of Posts and Telecommunications, Beijing 100876, China
3. The People's Bank of China, Changzhou Branch, Changzhou 213001, China
Funds:The National Science and Technology Basic Work Project (2015FY111700-6), The Doctoral Research Fund of Jiangsu University of Technology (KYY19042)
摘要
摘要:特征选择是机器学习、自然语言处理和数据挖掘等领域中数据预处理阶段必不可少的步骤。在一些基于信息论的特征选择算法中,存在着选择不同参数就是选择不同特征选择算法的问题。如何确定动态的非先验权重并规避预设先验参数就成为一个急需解决的问题。该文提出动态加权的最大相关性和最大独立性(WMRI)的特征选择算法。首先该算法分别计算新分类信息和保留类别信息的平均值。其次,利用标准差动态调整这两种分类信息的参数权重。最后,WMRI与其他5个特征选择算法在3个分类器上,使用10个不同数据集,进行分类准确率指标(fmi)验证。实验结果表明,WMRI方法能够改善特征子集的质量并提高分类精度。
关键词:特征选择/
分类信息/
平均值/
标准差/
动态加权
Abstract:Feature selection is an essential step in the data preprocessing phase in the fields of machine learning, natural language processing and data mining. In some feature selection algorithms based on information theory, there is a problem that choosing different parameters means choosing different feature selection algorithms. How to determine the dynamic, non-a priori weights and avoid the preset a priori parameters become an urgent problem. A Dynamic Weighted Maximum Relevance and maximum Independence (WMRI) feature selection algorithm is proposed in this paper. Firstly, the algorithm calculates the average value of the new classification information and the retained classification information. Secondly, the standard deviation is used to dynamically adjust the parameter weights of these two types of classification information. At last, WMRI and the other five feature selection algorithms use ten different data sets on three classifiers for the fmi classification metrics validation. The experimental results show that the WMRI method can improve the quality of feature subsets and increase classification accuracy.
Key words:Feature selection/
Classification information/
Average value/
Standard deviation/
Dynamic weighting
PDF全文下载地址:
https://jeit.ac.cn/article/exportPdf?id=d2fac0db-fcf8-49ad-b1bf-fa1c347f3b77