删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于对称不确定性和三路交互信息的特征子集选择算法

本站小编 Free考研考试/2022-01-16

顾翔元,郭继昌,李重仪,肖利军
AuthorsHTML:顾翔元,郭继昌,李重仪,肖利军
AuthorsListE:Gu Xiangyuan,Guo Jichang,Li Chongyi,Xiao Lijun
AuthorsHTMLE:Gu Xiangyuan,Guo Jichang,Li Chongyi,Xiao Lijun
Unit:天津大学电气自动化与信息工程学院,天津 300072
Unit_EngLish:School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China
Abstract_Chinese:由于在评价冗余特征时只考虑对称不确定性或最大信息系数等某一种度量标准,使得现有的一些特征子集选择算法存在性能不理想的问题.针对该问题,提出了一种基于对称不确定性和三路交互信息的特征子集选择算法.首先,计算特征与类标签的对称不确定性,按照其值大小对特征作降序排序处理,并消除不相关特征;然后,计算特征间的对称不确定性以及特征与类标签的三路交互信息,并与特征与类标签的对称不确定性一起,经过比较和排序等运算以消除冗余特征而得到选取的特征.在评价冗余特征上同时考虑对称不确定性和三路交互信息两种度量标准,并结合比较和排序等运算,可以减少将相关特征当作冗余特征而消除的情况,使得一些效果显著的相关特征得以保留.为验证所提算法的性能,采用J48、IB1和Na?ve Bayes 3种分类器将其与另外4种特征子集选择算法在3个UCI数据集和9个ASU数据集上进行实验.实验结果表明,所提算法能够在选取特征数和用时均较少的情况下取得很好的特征选择效果.
Abstract_English:It is known that only one metric is considered for evaluating redundant features such as symmetric uncertainty or maximum information coefficient and existing feature subset selection algorithms used for evaluation are not able to deliver the desired results. So our objective is to solve this problem and a feature subset selection algorithm based on symmetric uncertainty and three-way interaction information(SUTII) is proposed. First,symmetric uncertainty between features and the class label is evaluated,and features are arranged in descending order by ranking,and irrelevant features are removed. Then three-way interaction information among features and the class label and symmetric uncertainty between features are calculated and they are used jointly with symmetric uncertainty between features and the class label in a way of comparison and ranking calculation to remove redundant features. In this study,evaluating redundant features,both three-way interaction information and symmetric uncertainty are considered,and comparison and ranking calculation are adopted. The simulation that relevant feature are considered as redundant features and removed is decreased and some informative relevant features are retained. For validating the performance,SUTII is compared with four feature subset selection algorithms. Three classifiers J48,IB1,Na?ve Bayes,three UCI datasets,and nine ASU datasets are used in the experiment. Experimental results demonstrate that SUTII can achieve better feature selection performance by means of few selected features and by consuming less time.
Keyword_Chinese:特征子集选择;三路交互信息;对称不确定性;特征选择;排序
Keywords_English:feature subset selection;three-way interaction information;symmetric uncertainty;feature selection;ranking

PDF全文下载地址:http://xbzrb.tju.edu.cn/#/digest?ArticleID=6593
相关话题/信息 子集