

浏览全部资源
扫码关注微信
重庆大学微电子与通信工程学院,重庆 400044
Received:26 October 2023,
Revised:2024-03-08,
Published:25 December 2024
移动端阅览
颜芳, 马洁, 李勇明, 等. 面向集成学习的流形近邻样本包络与分层多类型变换算法[J]. 电子学报, 2024, 52(12): 4125-4141.
YAN Fang, MA Jie, LI Yong-ming, et al. Manifold Nearest Neighbor Sample Envelope and Hierarchical Multitype Transform Algorithm for Ensemble Learning[J]. Acta Electronica Sinica, 2024, 52(12): 4125-4141.
颜芳, 马洁, 李勇明, 等. 面向集成学习的流形近邻样本包络与分层多类型变换算法[J]. 电子学报, 2024, 52(12): 4125-4141. DOI:10.12263/DZXB.20231002
YAN Fang, MA Jie, LI Yong-ming, et al. Manifold Nearest Neighbor Sample Envelope and Hierarchical Multitype Transform Algorithm for Ensemble Learning[J]. Acta Electronica Sinica, 2024, 52(12): 4125-4141. DOI:10.12263/DZXB.20231002
集成学习是机器学习领域的重要分支和研究热点.目前集成学习算法的主要范式是:基于原样本集得到多个样本子集,分别训练基分类器,集成基分类器结果.这种做法的主要问题在于:由于各子集均来自原样本集,因此,各子集之间的多样性显著降低.尤其当原样本集数据尺寸小、采样比率大、不平衡程度高时,这一问题非常严重.此外,当原样本集可分度低时,重采样获得的样本子集的可分度改善也有限.为解决这个问题,本文提出面向集成学习的流形近邻样本包络与分层多类型变换算法,旨在通过包络化机制和多类型样本变换将原样本集转化为具有差异性的分层包络样本集,从而提高样本子集的多样性和可分度.首先设计流形近邻样本包络化机制,将原样本转化为样本包络.然后对样本包络进行多类型样本变换,重构生成分层包络样本.接着,设计基于联合结构域适应的层间一致性保持机制,保持变换前后样本分布的一致性,提高包络样本对原样本的高表征能力.此后,针对各层包络样本集,分别进行特征降维和训练基分类器.最后,采用二维决策融合机制得到最终分类结果.实验部分采用了十余个数据集和多个相关算法用于验证.结果表明,相较于原样本集,本文算法构造的分层包络样本集提高了样本子集的多样性,改进了集成学习性能,准确率最高提升了18.56%.与相关集成学习算法相比,准确率最高提升了7.56%.本文工作为现有集成学习算法改进研究提供了新思路,将直接基于原样本的集成学习范式转化为基于分层包络样本的集成学习新范式,具有参考价值.
Ensemble learning is an important branch and research hotspot in machine learning. The current main paradigm of ensemble learning algorithms is to obtain multiple sample subsets based on the original sample set
then to train the base classifiers separately and integrate the base classifier results. The main problem of this paradigm is that the diversity among subsets is significantly reduced since all subsets are derived from the original sample set. This problem is especially serious when the data size of the original sample set is small
the sampling ratio is large
and the degree of imbalance is high. In addition
the improvement in the divisibility of the sample subsets obtained by resampling is also limited when the divisibility of the original sample set is low. In order to solve this problem
this paper proposes a manifold nearest neighbor sample envelope and hierarchical multitype transformation algorithm for ensemble learning. It aims to improve the diversity and divisibility of the sample subset by transforming the original sample set into a hierarchical enveloped sample set with differentiation through the envelopment mechanism and the multitype sample transformation. First
the manifold nearest neighbor sample envelope mechanism is designed to transform the original samples into sample envelopes. Second
a multitype sample transformation is performed on the sample envelope to reconstruct and generate hierarchical envelope samples. Third
the inter-layer consistency preservation mechanism based on joint structure domain adaptation is designed to preserve the distribution consistency of the samples before and after the transformation. Thus
improving the high representation ability of the envelope samples to the original samples. Four
feature dimensionality reduction and basic classifier training are performed separately for each layer of the envelope sample set. Finally
the final classification results are obtained using the two dimensional decision fusion mechanism. More than ten datasets and several representative algorithms are used in the experimental part for validation. The results show that compared with the original sample set
the proposed algorithm improves the diversity of the sample subsets
which improves the ensemble learning performance with up to 18.56% accuracy improvement. Compared with related ensemble learning algorithms
the accuracy of this paper’s algorithm has been improved by up to 7.56%. This paper provides a new idea for the improvement of existing ensemble learning algorithms
and it is valuable to transform the paradigm of “ensemble learning directly based on original samples” into a new paradigm of “ensemble learning based on hierarchical envelope samples”.
周志华 . 机器学习 [M ] . 北京 : 清华大学出版社 , 2016 .
ZHOU Z H . Machine Learning [M ] . Beijing : Tsinghua University Press , 2016 . (in Chinese)
MOHAMMED A , KORA R . A comprehensive review on ensemble deep learning: Opportunities and challenges [J ] . Journal of King Saud University - Computer and Information Sciences , 2023 , 35 ( 2 ): 757 - 774 .
刘金平 , 何捷舟 , 马天雨 , 等 . 基于KELM选择性集成的复杂网络环境入侵检测 [J ] . 电子学报 , 2019 , 47 ( 5 ): 1070 - 1078 .
LIU J P , HE J Z , MA T Y , et al . Selective ensemble of KELM-based complex network intrusion detection [J ] . Acta Electronica Sinica , 2019 , 47 ( 5 ): 1070 - 1078 . (in Chinese)
FREUND Y , SCHPIRE R E . Experiments with a new boosting algorithm [C ] // 13th International Conference on Machine Learning . Morgan : Kaufmann , 1996 , 96 : 148 - 156 .
BREIMAN L . Bagging predictors [J ] . Machine Learning , 1996 , 24 ( 2 ): 123 - 140 .
SHI J Q , LI C X , YAN X H . Artificial intelligence for load forecasting: A stacking learning approach based on ensemble diversity regularization [J ] . Energy , 2023 , 262 : 125295 .
朱鹏飞 , 胡清华 , 于达仁 . 基于随机化属性选择和邻域覆盖约简的集成学习 [J ] . 电子学报 , 2012 , 40 ( 2 ): 273 - 279 .
ZHU P F , HU Q H , YU D R . Ensemble learning based on randomized attribute selection and neighborhood covering reduction [J ] . Acta Electronica Sinica , 2012 , 40 ( 2 ): 273 - 279 . (in Chinese)
CUI S Z , WANG Y Z , YIN Y Q , et al . A cluster-based intelligence ensemble learning method for classification problems [J ] . Information Sciences , 2021 , 560 : 386 - 409 .
LEE S J , XU Z Z , LI T , et al . A novel bagging C4.5 algorithm based on wrapper feature selection for supporting wise clinical decision making [J ] . Journal of Biomedical Informatics , 2018 , 78 : 144 - 155 .
ZHAO C M , WU D R , HUANG J , et al . BoostTree and BoostForest for ensemble learning [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 7 ): 8110 - 8126 .
SHU J , YUAN X , MENG D Y , et al . CMW-net: Learning a class-aware sample weighting mapping for robust deep learning [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 10 ): 11521 - 11539 .
GOODFELLOW I , POUGET-ABADIE J , MIRZA M , et al . Generative adversarial networks [J ] . Communications of the ACM , 2020 , 63 ( 11 ): 139 - 144 .
NIE F P , LI Z H , WANG R , et al . An effective and efficient algorithm for K-means clustering with new formulation [J ] . IEEE Transactions on Knowledge and Data Engineering , 2023 , 35 ( 4 ): 3433 - 3443 .
SAKAR B E , ERDEM ISENKUL M , SAKAR C O , et al . Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings [J ] . IEEE Journal of Biomedical and Health Informatics , 2013 , 17 ( 4 ): 828 - 834 .
LI Y M , LIU C Y , WANG P , et al . Envelope multi-type transformation ensemble algorithm of Parkinson speech samples [J ] . Applied Intelligence , 2023 , 53 ( 12 ): 15957 - 15978 .
MOHAMED A , QIAN K , ELHOSEINY M , et al . Social-STGCNN: A social spatio-temporal graph convolutional neural network for human trajectory prediction [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 14412 - 14420 .
XIA S , LIU Y , DING X , et al . Granular ball computing classifiers for efficient, scalable and robust learning [J ] . Information Sciences , 2019 , 483 : 136 - 152 .
HINTON G E , SALAKHUTDINOV R R . Reducing the dimensionality of data with neural networks [J ] . Science , 2006 , 313 ( 5786 ): 504 - 507 .
NGO G , BEARD R , CHANDRA R . Evolutionary bagging for ensemble learning [J ] . Neurocomputing , 2022 , 510 : 1 - 14 .
BREIMAN L . Randomizing outputs to increase prediction accuracy [J ] . Machine Learning , 2000 , 40 ( 3 ): 229 - 242 .
DIETTERICH T G , BAKIRI G . Solving multiclass learning problems via error-correcting output codes [J ] . Journal of Artificial Intelligence Research , 1995 , 2 : 263 - 286 .
DENG X L , DAI Z G , SUN M D , et al . Variational autoencoder based enhanced behavior characteristics classification for social robot detection [C ] // International Conference on Security and Privacy in Digital Economy . Singapore : Springer , 2020 : 232 - 248 .
GOODFELLOW I , POUGET-ABADIE J , MIRZA M , et al . Generative adversarial networks [J ] . Communications of the ACM , 2020 , 63 ( 11 ): 139 - 144 .
MENG F Y , LIU H , LIANG Y S , et al . Sample fusion network: An end-to-end data augmentation network for skeleton-based human action recognition [J ] . IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society , 2019 , 28 ( 11 ): 5281 - 5295 .
SHIUE Y R , YOU G R , SU C T , et al . Balancing accuracy and diversity in ensemble learning using a two-phase artificial bee colony approach [J ] . Applied Soft Computing , 2021 , 105 : 107212 .
KADKHODAEI H R , MOGHADAM A M E , DEHGHAN M . HBoost: A heterogeneous ensemble classifier based on the Boosting method and entropy measurement [J ] . Expert Systems with Applications , 2020 , 157 : 113482 .
MAO S S , LIN W S , JIAO L C , et al . End-to-end ensemble learning by exploiting the correlation between individuals and weights [J ] . IEEE Transactions on Cybernetics , 2021 , 51 ( 5 ): 2835 - 2846 .
JAN Z M , VERMA B . Multiple elimination of base classifiers in ensemble learning using accuracy and diversity comparisons [J ] . ACM Transactions on Intelligent Systems and Technology , 2020 , 11 ( 6 ): 1 - 17 .
YANG Y , JIANG J M . Hybrid sampling-based clustering ensemble with global and local constitutions [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2016 , 27 ( 5 ): 952 - 965 .
JAN Z , MUNOS J C , ALI A . A novel method for creating an optimized ensemble classifier by introducing cluster size reduction and diversity [J ] . IEEE Transactions on Knowledge and Data Engineering , 2022 , 34 ( 7 ): 3072 - 3081 .
ASAFUDDOULA M , VERMA B , ZHANG M J . A divide-and-conquer-based ensemble classifier learning by means of many-objective optimization [J ] . IEEE Transactions on Evolutionary Computation , 2018 , 22 ( 5 ): 762 - 777 .
ARMANO G , TAMPONI E . Building forests of local trees [J ] . Pattern Recognition , 2018 , 76 : 380 - 390 .
KIZILOZ H E . Classifier ensemble methods in feature selection [J ] . Neurocomputing , 2021 , 419 : 97 - 107 .
MD JAN Z , VERMA B . Evolutionary classifier and cluster selection approach for ensemble classification [J ] . ACM Transactions on Knowledge Discovery from Data , 2020 , 14 ( 1 ): 1 - 18 .
WANG L J , MO T L , WANG X T , et al . A hierarchical fusion framework to integrate homogeneous and heterogeneous classifiers for medical decision-making [J ] . Knowledge-Based Systems , 2021 , 212 : 106517 .
0
Views
15
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621