电子学报 ›› 2022, Vol. 50 ›› Issue (11): 2778-2789.DOI: 10.12263/DZXB.20211263

• 学术论文 • 上一篇    下一篇

由粗到细的分层特征选择

刘浩阳1,2, 林耀进1,2, 刘景华3, 吴镒潾1,2, 毛煜1,2, 李绍滋4   

  1. 1.闽南师范大学计算机学院, 福建 漳州 363000
    2.数据科学与智能应用福建省高等学校重点实验室(闽南师范大学), 福建 漳州 363000
    3.华侨大学计算机科学与技术学院, 福建 厦门 361021
    4.厦门大学人工智能系, 福建 厦门 361005
  • 收稿日期:2021-09-14 修回日期:2021-11-02 出版日期:2022-11-25
    • 作者简介:
    • 刘浩阳 男, 1998年7月生于福建省龙岩市.现为闽南师范大学硕士研究生.主要研究方向为数据挖掘.E‑mail: liuhaoyang98@163.com
      林耀进(通讯作者) 男, 1980年10月生于福建省漳州市.现为闽南师范大学计算机学院教授.主要研究方向为数据挖掘、机器学习.
    • 基金资助:
    • 国家自然科学基金面上项目 (62076116); 福建省自然科学基金重点项目 (2021J02049)

Hierarchical Feature Selection from Coarse to Fine

LIU Hao-yang1,2, LIN Yao-jin1,2, LIU Jing-hua3, WU Yi-lin1,2, MAO Yu1,2, LI Shao-zi4   

  1. 1.School of Computer Science,Minnan Normal University,Zhangzhou,Fujian 363000,China
    2.Key Laboratory of Data Science and Intelligence Application(Minnan Normal University),Fujian Province University,Zhangzhou,Fujian 363000,China
    3.Department of Computer Science and Technology,Huaqiao University,Xiamen,Fujian 361021,China
    4.Department of Artificial Intelligence,Xiamen University,Xiamen,Fujian 361005,China
  • Received:2021-09-14 Revised:2021-11-02 Online:2022-11-25 Published:2022-11-19

摘要:

利用数据类别间层次结构关系进行分类学习任务广泛存在于疾病诊断、图像标注等领域.然而,数据特征空间的高维性,使得分层分类学习面临着时间复杂度高和存储负担大等问题.另外,现有研究工作都假设训练集标记粒度是充分细化,与实际分层分类学习中划分细粒度标记代价高,类别标记间存在语义歧义性等矛盾.为解决上述问题,提出一种由粗到细的分层特征选择算法.该算法考虑类内一致性和兄弟节点间的差异性以选择有代表性特征,同时在特征选择的过程中实现预测训练样本未知的细粒度标记.在7个基准数据集上的实验结果表明,所提算法的分类性能优于一些先进的对比算法,且能处理标记粒度不够细化的情况.

关键词: 特征选择, 分层分类, 标记层次结构, 标记粒度, 递归正则化, 稀疏优化, 全局最优解

Abstract:

The task of classification learning using hierarchy of categories in data exists widely in many practical applications such as disease diagnosis, image annotation, etc. However, the high dimensionality of data feature space makes hierarchical classification learning confront problems such as high time and space complexity. In addition, existing research works assume that the training set label granularity is sufficiently fine-grained, which is contradictory to the actual hierarchical classification learning, i.e., dividing fine-grained labels is costly and ambiguity exists among category labels. To solve the above problems, we propose a coarse-to-fine hierarchical feature selection algorithm. We consider intra-class consistency and inter-sibling variability to select representative features and the unknown fine-grained labels of the training samples are predicted during feature selection. Experimental results on seven benchmark datasets show that the proposed algorithm outperforms some advanced comparative algorithms in classification performance and can handle the case where the label granularity is not fine-grained enough.

Key words: feature selection, hierarchical classification, label hierarchical structure, label granularity, recursive regularization, sparse optimization, global optimal solution

中图分类号: