面向分类的流特征在线特征选择算法

尤殿龙; 郭松; 赵春慧; 原福永; 申利民; 陈真

doi:10.3969/j.issn.0372-2112.2020.02.015

您当前的位置：

首页 >

文章列表页 >

面向分类的流特征在线特征选择算法

学术论文 | 更新时间：2025-07-16

- 面向分类的流特征在线特征选择算法
- Online Feature Selection with Streaming Features for Classification
- 电子学报 2020年48卷第2期页码：321-332
- 作者机构：
  
  1. 燕山大学信息科学与工程学院,河北,秦皇岛,066004
  2. 河北省计算机虚拟技术与系统集成重点实验室,河北,秦皇岛,066004
  3. 燕山大学信息科学与工程学院,河北,秦皇岛,066004
  4. 河北省计算机虚拟技术与系统集成重点实验室,河北,秦皇岛,066004
- 作者简介：
- 基金信息：
  
  国家自然科学基金 (No.61772450）;中国博士后科学基金 (No.2018M631764）;河北省自然科学基金 (No.F2019203287，No.F2017203307）;河北省科技计划项目 (No.17210701D）河北省博士后科研项目 (No.B2018003009）;河北省教育厅科学研究计划项目 (No.KCJSX2017028）燕山大学基础研究专项课题 (No.16SKY011）;燕山大学博士基金 (No.BL18003）
- DOI：10.3969/j.issn.0372-2112.2020.02.015
  中图分类号： TP39
- 网络出版：2020-02-25，
  
  纸质出版：2020
- 稿件说明：
移动端阅览
尤殿龙, 郭松, 赵春慧, 等. 面向分类的流特征在线特征选择算法[J]. 电子学报, 2020,48(2):321-332.

YOU Dian-long, GUO Song, ZHAO Chun-hui, et al. Online Feature Selection with Streaming Features for Classification[J]. Acta Electronica Sinica, 2020, 48(2): 321-332.
尤殿龙, 郭松, 赵春慧, 等. 面向分类的流特征在线特征选择算法[J]. 电子学报, 2020,48(2):321-332. DOI： 10.3969/j.issn.0372-2112.2020.02.015.

YOU Dian-long, GUO Song, ZHAO Chun-hui, et al. Online Feature Selection with Streaming Features for Classification[J]. Acta Electronica Sinica, 2020, 48(2): 321-332. DOI： 10.3969/j.issn.0372-2112.2020.02.015.

摘要

在线流特征选择通过实时过滤无关特征和冗余特征，实现流特征空间降维.针对已有算法，如Alpha-investing分类精度低、SAOLA选择特征数多和OSFS在低冗余高相关数据集下运行时间长的问题，提出了一种面向分类的流特征在线特征选择算法OSFIC.算法运用四层过滤框架，通过无条件独立过滤不相关新特征、单条件下互信息过滤冗余新特征和候选特征集合中的部分冗余特征，最后通过多条件独立过滤候选特征集中的剩余冗余特征，最终得到分类标签的近似马尔可夫毯.为了分析OSFIC的性能，选择了NIPS 2003和Causality Workbench中的数据集，从预测精度、特征数量、运行时间和AUC方面与已有基准算法进行比较.实验表明，OSFIC平均分类精度比Alpha-investing提升4.41%.在保证精度的前提下，平均特征数量比SAOLA减少41.9%，运行时间比OSFS减少91.59%.最后，在真实的应用场景下验证了OSFIC的有效性.

Abstract

Online streaming feature selection achieves stream feature space dimensionality reduction by filtering irrelevant features and redundant features in real time. Existing works

such as Alpha-investing and Online Streaming Feature Selection (OSFS)

have been proposed to serve this purpose

but they have drawbacks

including low prediction accuracy and high running time if the streaming features exhibit characteristics such as low redundancy and high relevance. We propose a novel classification-oriented online feature selection algorithm for streaming features

named OSFIC. OSFIC uses a four-layer filtering framework to filter irrelevant new features by null-conditional independence

filter redundant new features and redundant features in a candidate feature set by a single-conditional mutual information

and finally filter the remaining redundancy in the candidate feature set by multi-conditional independence. The approximate Markov blanket of the classify label is finally obtained. To analyze the performance of the algorithm

we selected the datasets in NIPS 2003 and Causality Workbench to compare prediction accuracy

number of selected features

runtime

and AUC with existing state-of-the-art algorithms. Experiments show that the average classification accuracy of OSFIC is 4.41% higher than that of Alpha-investing. Under the premise of high precision

the average number of features is 41.9% lower than SAOLA

and the runtime is 91.59% lower than OSFS. Finally

the efficiency of OSFIC is verified in real scenarios.

关键词

Keywords

references

浏览量

144

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于梯度离散度和互信息准则的残缺指纹方向场重建融合算法

遗传顺序IB算法

基于简单Schur凹函数的图像配准测度研究

一种基于类haar小波的MCTF视频编码方案

基于改进PV插值和混合优化算法的医学图像配准