删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

Feature selection algorithm for text classification based on improved mutual information

本站小编 哈尔滨工业大学/2019-10-23

Feature selection algorithm for text classification based on improved mutual information

CongShuai, ZHANG Ji-bin, XU Zhi-ming, WANG Yu-ying

School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China



Abstract:

In order to solve the poor performance in text classification when using traditional formula of mutual information (MI),a feature selection algorithm were proposed based on improved mutual information.The improved mutual information algorithm,which is on the basis of traditional improved mutual information methods that enhance the MI value of negative characteristics and feature’s frequency,supports the concept of concentration degree and dispersion degree.In accordance with the concept of concentration degree and dispersion degree,formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these.In this paper,the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods.The experimental results showed that the improved mutual information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain.Through the introduction of the concept of concentration degree and dispersion degree,the improved mutual information feature selection method greatly improves the performance of text classification system.

Key words:  text classification  feature selection  improved mutual information  Biomimetic Pattern Recognition

DOI:10.11916/j.issn.1005-9113.2011.03.027

Clc Number:TP391.1

Fund:


相关话题/Feature selection algorithm text classification