1. 燕山大学信息科学与工程学院,河北,秦皇岛,066004
2. 河北省计算机虚拟技术与系统集成重点实验室,河北,秦皇岛,066004
3. 燕山大学信息科学与工程学院,河北,秦皇岛,066004
4. 河北省计算机虚拟技术与系统集成重点实验室,河北,秦皇岛,066004
网络出版:2020-02-25,
纸质出版:2020
移动端阅览
尤殿龙, 郭松, 赵春慧, 等. 面向分类的流特征在线特征选择算法[J]. 电子学报, 2020,48(2):321-332.
YOU Dian-long, GUO Song, ZHAO Chun-hui, et al. Online Feature Selection with Streaming Features for Classification[J]. Acta Electronica Sinica, 2020, 48(2): 321-332.
尤殿龙, 郭松, 赵春慧, 等. 面向分类的流特征在线特征选择算法[J]. 电子学报, 2020,48(2):321-332. DOI: 10.3969/j.issn.0372-2112.2020.02.015.
YOU Dian-long, GUO Song, ZHAO Chun-hui, et al. Online Feature Selection with Streaming Features for Classification[J]. Acta Electronica Sinica, 2020, 48(2): 321-332. DOI: 10.3969/j.issn.0372-2112.2020.02.015.
在线流特征选择通过实时过滤无关特征和冗余特征,实现流特征空间降维.针对已有算法,如Alpha-investing分类精度低、SAOLA选择特征数多和OSFS在低冗余高相关数据集下运行时间长的问题,提出了一种面向分类的流特征在线特征选择算法OSFIC.算法运用四层过滤框架,通过无条件独立过滤不相关新特征、单条件下互信息过滤冗余新特征和候选特征集合中的部分冗余特征,最后通过多条件独立过滤候选特征集中的剩余冗余特征,最终得到分类标签的近似马尔可夫毯.为了分析OSFIC的性能,选择了NIPS 2003和Causality Workbench中的数据集,从预测精度、特征数量、运行时间和AUC方面与已有基准算法进行比较.实验表明,OSFIC平均分类精度比Alpha-investing提升4.41%.在保证精度的前提下,平均特征数量比SAOLA减少41.9%,运行时间比OSFS减少91.59%.最后,在真实的应用场景下验证了OSFIC的有效性.
Online streaming feature selection achieves stream feature space dimensionality reduction by filtering irrelevant features and redundant features in real time. Existing works
such as Alpha-investing and Online Streaming Feature Selection (OSFS)
have been proposed to serve this purpose
but they have drawbacks
including low prediction accuracy and high running time if the streaming features exhibit characteristics such as low redundancy and high relevance. We propose a novel classification-oriented online feature selection algorithm for streaming features
named OSFIC. OSFIC uses a four-layer filtering framework to filter irrelevant new features by null-conditional independence
filter redundant new features and redundant features in a candidate feature set by a single-conditional mutual information
and finally filter the remaining redundancy in the candidate feature set by multi-conditional independence. The approximate Markov blanket of the classify label is finally obtained. To analyze the performance of the algorithm
we selected the datasets in NIPS 2003 and Causality Workbench to compare prediction accuracy
number of selected features
runtime
and AUC with existing state-of-the-art algorithms. Experiments show that the average classification accuracy of OSFIC is 4.41% higher than that of Alpha-investing. Under the premise of high precision
the average number of features is 41.9% lower than SAOLA
and the runtime is 91.59% lower than OSFS. Finally
the efficiency of OSFIC is verified in real scenarios.
0
浏览量
144
下载量
2
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621