1.重庆大学微电子与通信工程学院,重庆 400030
2.重庆广播电视大学,重庆400052
[ "李 帆 男, 1993年出生于湖北省. 博士研究生. 主要研究领域为非平衡数据处理、机器学习.E-mail: 979940181@qq.com" ]
[ "张小恒 男, 1980年生, 四川达州人. 博士研究生, 副教授. 主要研究领域为医学信号处理、机器学习.E-mail: 7818320@qq.com" ]
[ "李勇明 男, 1976年生于四川. 博士,教授,博士生导师. 主要研究领域为医学信号处理、机器学习.中国电子学会会员编号:E190020470M.E-mail: yongmingli@cqu.edu.cn" ]
[ "王 品 女, 1979年出生于江苏省. 博士,副教授,硕士生导师. 主要研究领域为图像处理与识别.E-mail: wangpin@cqu.edu.cn" ]
收稿:2022-06-20,
修回:2022-11-03,
纸质出版:2024-03-25
移动端阅览
李帆,张小恒,李勇明,等.基于包络学习和分级结构一致性机制的不平衡集成算法[J].电子学报,2024,52(03):751-761.
LI Fan, ZHANG Xiao-heng, LI Yong-ming, et al.Imbalanced Ensemble Algorithm Based on Envelope Learning and Hierarchical Structure Consistency Mechanism[J].Acta Electronica Sinica, 2024, 52(03): 751-761.
李帆,张小恒,李勇明,等.基于包络学习和分级结构一致性机制的不平衡集成算法[J].电子学报,2024,52(03):751-761. DOI:10.12263/DZXB.20220712
LI Fan, ZHANG Xiao-heng, LI Yong-ming, et al.Imbalanced Ensemble Algorithm Based on Envelope Learning and Hierarchical Structure Consistency Mechanism[J].Acta Electronica Sinica, 2024, 52(03): 751-761. DOI:10.12263/DZXB.20220712
集成方法是不平衡学习方法的重要分支,然而,现有不平衡集成方法均作用于原样本而没考虑样本的结构信息,因此其效能仍然有限.样本的结构信息包括局部和全局结构信息.为了解决上述问题,本文提出了一种基于深度样本包络网络(Deep Instance Envelope Network, DIEN)和分级结构一致性机制(Hierarchical Structure Consistency Mechanism, HSCM)的不平衡集成学习算法.该算法在考虑局部流形和全局结构信息的情况下,通过多层样本聚类,生成高质量的多层包络样本,从而实现类平衡化.首先,算法基于样本近邻拼接和模糊C均值聚类算法,设计DIEN来挖掘样本的结构信息,得到深度包络样本.然后,设计局部流形结构度量和全局结构分布度量来构建HSCM用于增强层间样本的分布一致性.接着,将DIEN和HSCM结合起来,构建出优化后的深度样本包络网络——DH (DIEN with HSCM). 之后,将基分类器应用于包络样本.最后,设计bagging集成学习机制来融合基分类器的预测结果.文末组织了多组实验,采用了十多个公共数据集和有代表性的相关算法进行验证比较.实验结果表明,本文算法在AUC(Area Under Curve), F-measure等四个性能指标上显著最优.本文完整实验结果与分析参见链接:
https://pan.baidu.com/s/15lZ9GztB95ySrNwEmtrCfA
https://pan.baidu.com/s/15lZ9GztB95ySrNwEmtrCfA
,提取码:1111
Ensemble methods have become an important branch of imbalanced learning. However
the existing imbalanced ensemble methods all rely on the original instances without considering the structure information of the instances
so their effectiveness is still limited. The research shows that the structure information of instances includes local and global structure information. In order to solve the above problem
this paper proposes an imbalanced ensemble algorithm based on deep instance envelope network (DIEN) and hierarchical structure consistency mechanism (HSCM). Considering the local manifold and global structure information
the algorithm generates high-quality deep envelope instances to achieve class balance. Firstly
based on the instance neighborhood concatenation and fuzzy c-means clustering algorithm
the DIEN is designed to mine the structure information of instances
obtaining the deep envelope instances. Then
the local manifold structure measure and global structure distribution measure are designed to construct the HSCM to enhance the distribution consistency of interlayer instances. Next
DIEN and HSCM are combined to construct the optimized deep instance envelope network—DH (DIEN with HSCM). Then
the base classifier is applied to the deep envelope instances. Finally
the bagging ensemble learning mechanism is designed to fuse the prediction results of the base classifier to obtain the final results. At the end of this paper
several groups of experiments are organized. More than 10 public datasets and representative related algorithms are used for verification. Experimental results show that the proposed algorithm is significantly better in four performance metrics
such as AUC (Area Under Curve) and F-measure.
李艳霞 , 柴毅 , 胡友强 , 等 . 不平衡数据分类方法综述 [J ] . 控制与决策 , 2019 , 34 ( 4 ): 673 - 688 .
LI Y X , CHAI Y , HU Y Q , et al . Review of imbalanced data classification methods [J ] . Control and Decision , 2019 , 34 ( 4 ): 673 - 688 . (in Chinese)
翟云 , 王树鹏 , 马楠 , 等 . 基于单边选择链和样本分布密度融合机制的非平衡数据挖掘方法 [J ] . 电子学报 , 2014 , 42 ( 7 ): 1311 - 1319 .
ZHAI Y , WANG S P , MA N , et al . A data mining method for imbalanced datasets based on one-sided link and distribution density of instances [J ] . Acta Electronica Sinica , 2014 , 42 ( 7 ): 1311 - 1319 . (in Chinese)
欧阳震诤 , 罗建书 , 胡东敏 , 等 . 一种不平衡数据流集成分类模型 [J ] . 电子学报 , 2010 , 38 ( 1 ): 184 - 189 .
OUYANG Z Z , LUO J S , HU D M , et al . An ensemble classifier framework for mining imbalanced data streams [J ] . Acta Electronica Sinica , 2010 , 38 ( 1 ): 184 - 189 . (in Chinese)
QI C R , SU H , NIEBNER M , et al . Volumetric and multi-view CNNs for object classification on 3D data [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 5648 - 5656 .
KOTSIANTIS S B , KANELLOPOULOS D , PINTELAS P E . Handling imbalanced datasets: A review [J ] . GESTS International Transactions on Computer Science & Engineering , 2005 , 30 ( 1 ): 25 - 36 .
GALAR M , FERNANDEZ A , BARRENECHEA E , et al . A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches [J ] . IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , 2012 , 42 ( 4 ): 463 - 484 .
于重重 , 田蕊 , 谭励 , 等 . 非平衡样本分类的集成迁移学习算法 [J ] . 电子学报 , 2012 , 40 ( 7 ): 1358 - 1363 .
YU C C , TIAN R , TAN L , et al . Integrated transfer learning algorithmic for unbalanced samples classification [J ] . Acta Electronica Sinica , 2012 , 40 ( 7 ): 1358 - 1363 . (in Chinese)
SUN Y , KAMEL M S , WONG A K C , et al . Cost-sensitive boosting for classification of imbalanced data [J ] . Pattern Recognition , 2007 , 40 ( 12 ): 3358 - 3378 .
CHAWLA N V , LAZAREVIC A , HALL L O , et al . SMOTEBoost: Improving prediction of the minority class in boosting [C ] // Knowledge Discovery in Databases: PKDD 2003 . Berlin : Springer , 2003 : 107 - 119 .
WANG S , YAO X . Diversity analysis on imbalanced data sets by using ensemble models [C ] // 2009 IEEE Symposium on Computational Intelligence and Data Mining . Piscataway : IEEE , 2009 : 324 - 331 .
Douzas G , Bacao F . Effective data generation for imbalanced learning using conditional generative adversarial networks [J ] . Expert Systems with Applications , 2018 , 91 : 464 - 471 .
RAGHUWANSHI B S , SHUKLA S . UnderBagging based reduced kernelized weighted extreme learning machine for class imbalance learning [J ] . Engineering Applications of Artificial Intelligence , 2018 , 74 : 252 - 270 .
SEIFFERT C , KHOSHGOFTAAR T M , VAN HULSE J , et al . RUSBoost: A hybrid approach to alleviating class imbalance [J ] . IEEE Transactions on Systems, Man, and Cybernetics , Part A: Systems and Humans, 2010 , 40 ( 1 ): 185 - 197 .
HIDO S , KASHIMA H , TAKAHASHI Y . Roughly balanced bagging for imbalanced data [J ] . Statistical Analysis and Data Mining: the ASA Data Science Journal , 2009 , 2 ( 5/6 ): 412 - 426 .
LIU X Y , WU J X , ZHOU Z H . Exploratory undersampling for class-imbalance learning [J ] . IEEE Transactions on Systems, Man, and Cybernetics , Part B ( Cybernetics), 2009 , 39 ( 2 ): 539 - 550 .
TSAI C F , LIN W C , HU Y H , et al . Under-sampling class imbalanced datasets by combining clustering analysis and instance selection [J ] . Information Sciences , 2019 , 477 : 47 - 54 .
LIU Z N , CAO W , GAO Z F , et al . Self-paced ensemble for highly imbalanced massive data classification [C ] // 2020 IEEE 36th International Conference on Data Engineering (ICDE) . Piscataway : IEEE , 2020 : 841 - 852 .
YANG K X , YU Z W , WEN X , et al . Hybrid classifier ensemble for imbalanced data [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2020 , 31 ( 4 ): 1387 - 1400 .
CHEN Z , DUAN J , KANG L , et al . A hybrid data-level ensemble to enable learning from highly imbalanced dataset [J ] . Information Sciences , 2021 , 554 : 157 - 176 .
PEDRYCZ W , AL-HMOUZ R , BALAMASH A S , et al . Hierarchical granular clustering: An emergence of information granules of higher type and higher order [J ] . IEEE Transactions on Fuzzy Systems , 2015 , 23 ( 6 ): 2270 - 2283 .
BEZDEK J C , EHRLICH R , FULL W . FCM: The fuzzy c-means clustering algorithm [J ] . Computers & Geosciences , 1984 , 10 ( 2/3 ): 191 - 203 .
KANG Z , PENG C , CHENG Q . Clustering with adaptive manifold structure learning [C ] // 2017 IEEE 33rd International Conference on Data Engineering (ICDE) . Piscataway : IEEE , 2017 : 79 - 82 .
ZHANG L , WANG S S , HUANG G B , et al . Manifold criterion guided transfer learning via intermediate domain generation [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2019 , 30 ( 12 ): 3759 - 3773 .
LONG M S , CAO Y , WANG J M , et al . Learning transferable features with deep adaptation networks [C ] // Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 . New York : ACM , 2015 : 97 - 105 .
KANAMORI T , HIDO S , SUGIYAMA M . Efficient direct density ratio estimation for Non-stationarity adaptation and outlier detection [C ] // Proceedings of the 21st International Conference on Neural Information Processing Systems . New York : ACM , 2008 : 809 - 816 .
SHEN Y H , PEDRYCZ W , CHEN Y , et al . Hyperplane division in fuzzy C-means: Clustering big data [J ] . IEEE Transactions on Fuzzy Systems , 2020 , 28 ( 11 ): 3032 - 3046 .
XU Y H , YU Z W , CHEN C L P , et al . Adaptive subspace optimization ensemble method for high-dimensional imbalanced data classification [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2023 , 34 ( 5 ): 2284 - 2297 .
GARCÍA S , FERNÁNDEZ A , LUENGO J , et al . Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power [J ] . Information Sciences , 2010 , 180 ( 10 ): 2044 - 2064 .
HOLM S . A simple sequentially rejective multiple test procedure [J ] . Scandinavian Journal of Statistics , 1979 , 6 ( 2 ): 65 - 70 .
0
浏览量
17
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621