1.闽南师范大学计算机学院, 福建漳州 363000
2.数据科学与智能应用福建省高等学校重点实验室(闽南师范大学), 福建漳州 363000
3.华侨大学计算机科学与技术学院, 福建厦门 361021
4.厦门大学人工智能系, 福建厦门 361005
[ "刘浩阳 男, 1998年7月生于福建省龙岩市.现为闽南师范大学硕士研究生.主要研究方向为数据挖掘.E‑mail: liuhaoyang98@163.com" ]
[ "林耀进(通讯作者) 男, 1980年10月生于福建省漳州市.现为闽南师范大学计算机学院教授.主要研究方向为数据挖掘、机器学习." ]
收稿:2021-09-14,
修回:2021-11-02,
纸质出版:2022-11-25
移动端阅览
刘浩阳,林耀进,刘景华等.由粗到细的分层特征选择[J].电子学报,2022,50(11):2778-2789.
LIU Hao-yang,LIN Yao-jin,LIU Jing-hua,et al.Hierarchical Feature Selection from Coarse to Fine[J].ACTA ELECTRONICA SINICA,2022,50(11):2778-2789.
刘浩阳,林耀进,刘景华等.由粗到细的分层特征选择[J].电子学报,2022,50(11):2778-2789. DOI: 10.12263/DZXB.20211263.
LIU Hao-yang,LIN Yao-jin,LIU Jing-hua,et al.Hierarchical Feature Selection from Coarse to Fine[J].ACTA ELECTRONICA SINICA,2022,50(11):2778-2789. DOI: 10.12263/DZXB.20211263.
利用数据类别间层次结构关系进行分类学习任务广泛存在于疾病诊断、图像标注等领域.然而
数据特征空间的高维性
使得分层分类学习面临着时间复杂度高和存储负担大等问题.另外
现有研究工作都假设训练集标记粒度是充分细化
与实际分层分类学习中划分细粒度标记代价高
类别标记间存在语义歧义性等矛盾.为解决上述问题
提出一种由粗到细的分层特征选择算法.该算法考虑类内一致性和兄弟节点间的差异性以选择有代表性特征
同时在特征选择的过程中实现预测训练样本未知的细粒度标记.在7个基准数据集上的实验结果表明
所提算法的分类性能优于一些先进的对比算法
且能处理标记粒度不够细化的情况.
The task of classification learning using hierarchy of categories in data exists widely in many practical applications such as disease diagnosis
image annotation
etc. However
the high dimensionality of data feature space makes hierarchical classification learning confront problems such as high time and space complexity. In addition
existing research works assume that the training set label granularity is sufficiently fine-grained
which is contradictory to the actual hierarchical classification learning
i.e.
dividing fine-grained labels is costly and ambiguity exists among category labels. To solve the above problems
we propose a coarse-to-fine hierarchical feature selection algorithm. We consider intra-class consistency and inter-sibling variability to select representative features and the unknown fine-grained labels of the training samples are predicted during feature selection. Experimental results on seven benchmark datasets show that the proposed algorithm outperforms some advanced comparative algorithms in classification performance and can handle the case where the label granularity is not fine-grained enough.
王忠伟 , 陈叶芳 , 钱江波 , 等 . 基于LSH的高维大数据 k 近邻搜索算法 [J]. 电子学报 , 2016 , 44 ( 4 ): 906 ‐ 912 .
WANG Zhong-wei , CHEN Yie-fang , QIAN Jiang-bo , et al . LSH-Based algorithm for k nearest neighbor search on big data [J]. Acta Electronica Sinica , 2016 , 44 ( 4 ): 906 ‐ 912 . (in Chinese)
DENG J , DONG W , SOCHER R , et al . ImageNet: a large-scale hierarchical image database [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE Press , 2009 : 248 ‐ 255 .
WANG G Y , YANG J , XU J . Granular computing: from granularity optimization to multi-granularity joint problem solving [J]. Granular Computing , 2017 , 2 ( 3 ): 105 ‐ 120 .
YAO J T , VASILAKOS A V , et al . Granular computing: perspectives and challenges [J]. IEEE Transactions on Cybernetics , 2013 , 43 ( 6 ): 1977 ‐ 1989 .
BARGIELA A , PEDRYCZ W . Toward a theory of granular computing for human-centered information processing [J]. IEEE Transactions on Fuzzy Systems , 2008 , 16 ( 2 ): 320 ‐ 330 .
胡清华 , 王煜 , 周玉灿 , 等 . 大规模分类任务的分层学习方法综述 [J]. 中国科学: 信息科学 , 2018 , 48 ( 5 ): 487 ‐ 500 .
HU Q H , WANG Y , ZHOU Y C , et al . A review on hierarchical learning methods for large scale classification task [J]. Scientia Sinica Informationis , 2018 , 48 ( 5 ): 487 ‐ 500 . (in Chinese)
GUO S X , ZHAO H . Hierarchical classification with multi-path selection based on granular computing [J]. Artificial Intelligence Review , 2021 , 54 ( 3 ): 2067 ‐ 2089 .
SILLA C N , FREITAS A A . A survey of hierarchical classification across different application domains [J]. Data Mining and Knowledge Discovery , 2011 , 22 ( 1-2 ): 31 ‐ 72 .
FREEMAN C , KULIC D , BASIR O . Joint feature selection and hierarchical classifier design [C]// Proceedings of the International Conference on Systems, Man, and Cybernetics . Piscataway : IEEE Press , 2011 : 1728 ‐ 1734 .
GRIMAUDO L , MELLIA M , BARALIS E . Hierarchical learning for fine grained internet traffic classification [C]// Proceedings of International Wireless Communications and Mobile Computing Conference . Piscataway : IEEE Press , 2012 : 463 ‐ 468 .
SONG J , ZHANG P Z , QIN S J , et al . A method of the feature selection in hierarchical text classification based on the category discrimination and position information [J]. IEEE Transactions on Engineering Management , 2015 , 53 ( 4 ): 555 ‐ 569 .
ZHAO H , HU Q H , ZHU P F , et al . A recursive regularization based feature selection framework for hierarchical classification [J]. IEEE Transactions on Knowledge and Data Engineering , 2021 , 33 ( 7 ): 2833 ‐ 2846 .
TUO Q J , ZHAO H , HU Q H . Hierarchical feature selection with subtree based graph regularization [J]. Knowledge Based Systems , 2018 , 163 ( 1 ): 996 ‐ 1008 .
白盛兴 , 林耀进 , 王晨曦 , 等 . 基于邻域粗糙集的大规模层次分类在线流特征选择 [J]. 模式识别与人工智能 , 2019 , 32 ( 9 ): 811 ‐ 820 .
BAI Shengxing , LIN Yaojin , WANG Chenxi , et al . Large-scale hierarchical classification online streaming feature selection based on neighborhood rough set [J]. Pattern Recognition and Artificial Intelligence , 2019 , 32 ( 9 ): 811 ‐ 820 . (in Chinese)
LIU X X , ZHOU Y C , ZHAO H . Robust hierarchical feature selection driven by data and knowledge [J]. Information Sciences , 2021 , 551 : 341 ‐ 357 .
KOSMOPOULOS A , PARTALAS I , GAUSSIER É , et al . Evaluation measures for hierarchical classification: a unified view and novel approaches [J]. Data Mining and Knowledge Discovery , 2015 , 29 ( 3 ): 820 ‐ 865 .
刘洪涛 , 李航 , 王进 , 等 . 基于标签特定特征的多目标回归稀疏集成方法 [J]. 电子学报 , 2020 , 48 ( 5 ): 906 ‐ 913 .
LIU Hong-tao , LI Hang , WANG Jin , et al . Multi-target regression via sparse integration and label-specific features [J]. Acta Electronica Sinica , 2016 , 48 ( 5 ): 906 ‐ 912 . (in Chinese)
ARGYRIOU A , EVGENIOU T , PONTIL M . Multi-task feature learning [C]// Proceedings of the Annual Conference on Neural Information Processing Systems . Cambridge : MIT Press , 2006 : 41 ‐ 48 .
GRETTON A , BOUSQUET O , SMOLA A , et al . Measuring statistical dependence with hilbert-Schmidt norms [C]// Proceedings of the International Conference on Algorithmic Learning Theory . Berlin : Springer , 2005 : 63 ‐ 77 .
NIE F P , HUANG H , CAI X , et al . Efficient and robust feature selection via joint L2, 1-norms minimization [C]// Proceedings of the Annual Conference on Neural Information Processing Systems . Cambridge : MIT Press , 2010 : 1813 ‐ 1821 .
GU Q Q , LI Z H , HAN J W . Generalized fisher score for feature selection [C]// Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence . Virginia : AUAI Press , 2011 : 266 ‐ 273 .
FRIEDMAN M . A comparison of alternative tests of significance for the problem of m rankings [J]. The Annals of Mathematical Statistics , 1940 , 11 ( 1 ): 86 ‐ 92
DUNN O J . Multiple comparisons among means [J]. Journal of the American Statistical Association , 1961 , 56 ( 293 ): 52 ‐ 64
DEMSAR J . Statistical comparisons of classifiers over multiple data sets [J]. Journal of Machine Learning Research , 2006 , 7 ( 1 ): 1 ‐ 30
WEI L Y , LIAO M H , GAO X , et al . An improved protein structural prediction method by incorporating both sequence and structure information [J]. IEEE Transactions on NanoBioscience , 2015 , 14 ( 4 ): 339 ‐ 349 .
0
浏览量
6
下载量
6
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621