1.辽宁工程技术大学运筹与优化研究院,辽宁阜新 123000
2.辽宁工程技术大学优化与决策研究所,辽宁阜新 123000
[ "高雷阜 男,1963年2月出生,辽宁阜新人.博士,教授,博士生导师.主要研究方向为最优化理论与方法、机器学习与数据分析.E-mail: gaoleifu@163.com" ]
[ "张梦瑶 女,1996年1月出生,内蒙古呼伦贝尔人.硕士研究生.主要研究方向为机器学习与数据分析.E-mail: mengyaoz119@163.com" ]
收稿:2021-02-22,
修回:2021-10-21,
纸质出版:2022-10-25
移动端阅览
高雷阜,张梦瑶,赵世杰.融合簇边界移动与自适应合成的混合采样算法[J].电子学报,2022,50(10):2517-2529.
GAO Lei-fu,ZHANG Meng-yao,ZHAO Shi-jie.Mixed-Sampling Algorithm Combining Cluster Boundary Movement and Adaptive Synthesis[J].ACTA ELECTRONICA SINICA,2022,50(10):2517-2529.
高雷阜,张梦瑶,赵世杰.融合簇边界移动与自适应合成的混合采样算法[J].电子学报,2022,50(10):2517-2529. DOI: 10.12263/DZXB.20210265.
GAO Lei-fu,ZHANG Meng-yao,ZHAO Shi-jie.Mixed-Sampling Algorithm Combining Cluster Boundary Movement and Adaptive Synthesis[J].ACTA ELECTRONICA SINICA,2022,50(10):2517-2529. DOI: 10.12263/DZXB.20210265.
针对伪负采样算法(Pseudo-Negative Sampling,PNS)存在的类内子聚集和类别重叠问题,提出一种融合簇边界负样本移动策略(Cluster Boundary Negative Movement Strategy,CBNMS)与自适应正样本合成技术(Adaptive Positive Synthesis Technology,ADPST)的改进混合采样算法(Improved Cluster Boundary Negative Movement Strategy,ICBNMS),以提升非均衡数据的整体分类性能和正类识别精度.CBNMS策略采用凝聚层次聚类对正负类样本进行划分,并通过各局部样本间相似关系识别潜在负类中且与正类相关性较大的簇边界负样本,提高采样的局部精确性和时效性.为进一步加强CBNMS策略对正样本重叠区域的识别性能,ICBNMS算法在簇边界负样本移动均衡化基础上,引入ADPST技术,利用稀疏度与距离复合因子组合加权以自适应确定最优样本生成区域,从而有效削弱样本的重叠性且丰富样本的多样性.实验结果表明,相比其他采样算法,ICBNMS算法在10个非均衡数据集的多组实验中G-mean和F-measure等指标获得最优值,且时间效率比CDSMOTE和PNS算法分别提升了32.27%和27.88%,凸显出更优越的鲁棒性和泛化性.
For the problem of intra-class sub-gathering and class-overlapping in pseudo-negative sampling(PNS) algorithm
an improved mixed-sampling algorithm combining cluster boundary negative movement strategy(CBNMS) and adaptive positive synthesis technology(ADPST) is proposed to boost the overall classification performance and positive class identification accuracy of imbalanced data. The CBNMS strategy adopts AGENS(Agglomerative Hierarchical Cluster) to divide positive and negative samples
identifies the cluster boundary negative samples in the potential negative class with a large correlation with the positive class by the similar relationship between each local sample
and increases the local accuracy and timeliness of sampling. In order to further strengthen the identification performance of the CBNMS strategy for the overlap area of positive samples
the ICBNMS(Improved Cluster Boundary Negative Movement Strategy) algorithm introduces ADPST technology on the basis of moving equalization of negative samples at the cluster boundary and utilizes the combination of sparsity and distance composite factor weighting to adaptively determine the optimal sample generation area
thereby effectively weakening the overlap of samples and enriching the diversity of samples. Experiment results show that compared with other sampling algorithms
the ICBNMS algorithm can obtain the optimal values of G-mean
F-measure and other indicators in multiple experiments of 10 imbalanced data sets
and its time efficiency has improved by 32.27% and 27.88% respectively compared with the CDSMOTE and PNS algorithms
highlighting more superior robustness and generalization.
CORTES C , VAPNIK V . Support-vector networks [J]. Machine Learning , 1995 , 20 ( 3 ): 273 - 297 .
BREIMAN L . Random forests [J]. Machine Learning , 2001 , 45 ( 1 ): 5 - 32 .
HART P . The condensed nearest neighbor rule [J]. IEEE Transactions on Information Theory , 1968 , 14 ( 3 ): 515 - 516 .
FOTOUHI S , ASADI S , KATTAN M W . A comprehensive data level analysis for cancer diagnosis on imbalanced data [J]. Journal of Biomedical Informatics , 2019 , 90 : 103089 .
MAKKI S , ASSAGHIR Z , TAHER Y , et al . An experimental study with imbalanced classification approaches for credit card fraud detection [J]. IEEE Access , 2019 , 7 : 93010 - 93022 .
胡峰 , 王蕾 , 周耀 . 基于三支决策的不平衡数据过采样方法 [J]. 电子学报 , 2018 , 46 ( 1 ): 135 - 144 .
HU F , WANG L , ZHOU Y . An oversampling method for imbalance data based on three-way decision model [J]. Acta Electronica Sinica , 2018 , 46 ( 1 ): 135 - 144 . (in Chinese)
HE H B , GARCIA E A . Learning from imbalanced data [J]. IEEE Transactions on Knowledge and Data Engineering , 2009 , 21 ( 9 ): 1263 - 1284 .
LAURIKKALA J . Improving identification of difficult small classes by balancing class distribution [C]// Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine . Berlin : Springer , 2001 : 63 - 66 .
TSAI C F , LIN W C , HU Y H , et al . Under-sampling class imbalanced datasets by combining clustering analysis and instance selection [J]. Information Sciences , 2019 , 477 : 47 - 54 .
VUTTIPITTAYAMONGKOL P , ELYAN E . Neighbourhood-based undersampling approach for handling imbalanced and overlapped data [J]. Information Sciences , 2020 , 509 : 47 - 70 .
CHAWLA N V , BOWYER K W , HALL L O , et al . SMOTE: Synthetic minority over-sampling technique [J]. Journal of Artificial Intelligence Research , 2002 , 16 : 321 - 357 .
DOUZAS G , BACAO F . Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE [J]. Information Sciences , 2019 , 501 : 118 - 135 .
BATISTA G E A P A , PRATI R C , MONARD M C . A study of the behavior of several methods for balancing machine learning training data [J]. ACM SIGKDD Explorations Newsletter , 2004 , 6 ( 1 ): 20 - 29 .
ELYAN E , MORENO-GARCIA C F , JAYNE C . CDSMOTE: Class decomposition and synthetic minority class oversampling technique for imbalanced-data classification [J]. Neural Computing and Applications , 2021 , 33 ( 7 ): 2839 - 2851 .
张永清 , 卢荣钊 , 乔少杰 , 等 . 一种基于样本空间的类别不平衡数据采样方法 [J]. 自动化学报 , 2020 , DOI: 10.16383/j.aas.c200034 http://dx.doi.org/10.16383/j.aas.c200034 .
VOORHEES E M . Implementing agglomerative hierarchic clustering algorithms for use in document retrieval [J]. Information Processing & Management , 1986 , 22 ( 6 ): 465 - 476 .
NEKOOEIMEHR I , LAI-YUEN S K . Adaptive semi-unsupervised weighted oversampling(A-SUWO) for imbalanced datasets [J]. Expert Systems with Applications , 2016 , 46 : 405 - 416 .
ALCALÁ-FDEZ J , SÁNCHEZ L , GARCÍA S , et al . KEEL: A software tool to assess evolutionary algorithms for data mining problems [J]. Soft Computing , 2009 , 13 ( 3 ): 307 - 318 .
JAPKOWICZ N . Assessment metrics for imbalanced learning [M]// Imbalanced Learning . Hoboken : John Wiley & Sons, Inc. , 2013 : 187 - 206 .
CHEN B Y , XIA S Y , CHEN Z Z , et al . RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise [J]. Information Sciences , 2021 , 553 : 397 - 428 .
GAO X , REN B , ZHANG H , et al . An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling [J]. Expert Systems with Applications , 2020 , 160 : 113660 .
0
浏览量
9
下载量
4
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621