1.重庆邮电大学软件学院,重庆 400065
2.重庆邮电大学计算机学院,重庆 400065
3.计算智能重庆市重点实验室,重庆 400065
[ "代 劲 男,1978年出生.教授,博士,从事大数据知识工程、智能信息处理等研究.E-mail: daijin@cqupt.edu.cn" ]
[ "李 昊 男,1996年出生,硕士研究生,主要研究领域为机器学习,流数据挖掘.E-mail: S200201023@stu.cqupt.edu.cn" ]
[ "王国胤 男,1970出生.教授,博士生导师,从事粒计算主要研究领域为多粒度认知计算、认知计算、智能信息处理等研究. E-mail: wanggy@cqupt.edu.cn" ]
收稿:2023-02-10,
修回:2023-07-29,
纸质出版:2024-09-25
移动端阅览
代劲, 李昊, 王国胤. 基于动态样本选择的概念漂移自适应预测方法[J]. 电子学报, 2024, 52(09): 3228-3239.
DAI Jin, LI Hao, WANG Guo-yin. Concept Drift Adaptive Prediction Method Based on Dynamic Sample Selection[J]. Acta Electronica Sinica, 2024, 52(09): 3228-3239.
代劲, 李昊, 王国胤. 基于动态样本选择的概念漂移自适应预测方法[J]. 电子学报, 2024, 52(09): 3228-3239. DOI:10.12263/DZXB.20230124
DAI Jin, LI Hao, WANG Guo-yin. Concept Drift Adaptive Prediction Method Based on Dynamic Sample Selection[J]. Acta Electronica Sinica, 2024, 52(09): 3228-3239. DOI:10.12263/DZXB.20230124
概念漂移是影响流数据挖掘性能的重要因素,当前主要通过增量更新或重训练模型进行处理,但对已有知识并未充分利用.从综合利用全体样本出发,本文构建了一种基于动态样本选择的概念漂移自适应分类方法.该方法在新样本到来时进行基于局部一致性的漂移检测,在发现漂移发生时去除区域内的噪声样本,当检测到新概念出现时,对历史相似概念进行重用.最后,对区域内不同类别样本进行多代表点归纳,并同步更新预测模型.本文在含有不同漂移类型的合成数据集上进行去噪效果验证,并在真实数据集上进行预测任务.实验结果表明,该方法可以有效去除因概念漂移而形成的漂移噪声,有效提升了预测模型性能,整体预测表现优于流行的概念漂移自适应模型.
Concept drift is an important performance factor in stream data mining
mainly handled by incremental updating or retraining models
but not fully utilizing existing knowledge. This paper proposed an concept drift adaptive prediction method based on dynamic sample selection
starting from the comprehensive use of all samples. The method performs local consistency based drift detection when new samples arrive
removes noisy samples in the region when drift is detected
and reuses historically similar concepts when new concepts are detected. Finally
multi-representative point summarization is performed for different categories of samples in the region
and the prediction model is updated simultaneously. In this paper
the denoising effect is verified on synthetic datasets containing different drift types
and the prediction task is performed on the real dataset. The experimental results show that the method can effectively remove the drift noise due to conceptual drift
which effectively improves the performance of the prediction model. The prediction outperforms the popular concept drift adaptive model.
MOHAWESH R , TRAN S , OLLINGTON R , et al . Analysis of concept drift in fake reviews detection [J ] . Expert Systems with Applications , 2021 , 169 : 114318 .
WANG L , WU C . Dynamic imbalanced business credit evaluation based on Learn++ with sliding time window and weight sampling and FCM with multiple kernels [J ] . Information Sciences , 2020 , 520 : 305 - 323 .
HENKE M , SANTOS E , SOUTO E , et al . Spam detection based on feature evolution to deal withConcept drift [J ] . Journal of Universal Computer Science , 2021 , 27 ( 4 ): 364 - 386 .
YANG L M , GUO W B , HAO Q Y , et al . CADE: Detecting and explaining concept drift samples for security applications [C ] // 30th USENIX Security Symposium . Berkeley : USENIX , 2021 : 2327 - 2344 .
韩光洁 , 赵腾飞 , 刘立 , 等 . 基于多元区域集划分的工业数据流概念漂移检测 [J ] . 电子学报 , 2023 , 51 ( 7 ): 1906 - 1916 .
HAN G J , ZHAO T F , LIU L , et al . Concept drift detection of industrial data flow based on multivariate region set partition [J ] . Acta Electronica Sinica , 2023 , 51 ( 7 ): 1906 - 1916 . (in Chinese)
陆克中 , 陈超凡 , 蔡桓 , 等 . 面向概念漂移和类不平衡数据流的在线分类算法 [J ] . 电子学报 , 2022 , 50 ( 3 ): 585 - 597 .
LU K Z , CHEN C F , CAI H , et al . Online classification algorithm for concept drift and class imbalance data stream [J ] . Acta Electronica Sinica , 2022 , 50 ( 3 ): 585 - 597 . (in Chinese)
GAMA J , MEDAS P , CASTILLO G , et al . Learning with drift detection [C ] // Advances in Artificial Intelligence- SBIA 2004 . Berlin : Springer Berlin Heidelberg , 2004 : 286 - 295
FRIAS-BLANCO I , DEL CAMPO-AVILA J , RAMOS-JIMENEZ G , et al . Online and non-parametric drift detection methods based on hoeffding's bounds [J ] . IEEE Transactions on Knowledge and Data Engineering , 2015 , 27 ( 3 ): 810 - 823 .
LIU A J , LU J , ZHANG G Q . Concept drift detection via equal intensity k-means space partitioning [J ] . IEEE Transactions on Cybernetics , 2021 , 51 ( 6 ): 3198 - 3211 .
YANG Z , AL-DAHIDI S , BARALDI P , et al . A novel concept drift detection method for incremental learning in nonstationary environments [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2020 , 31 ( 1 ): 309 - 320 .
LU J , LIU A J , DONG F , et al . Learning under concept drift: A review [J ] . IEEE transactions on Knowledge and Data Engineering , 2019 , 31 ( 12 ): 2346 - 2363 .
ELWELL R , POLIKAR R . Incremental learning of concept drift in nonstationary environments [J ] . IEEE Transactions on Neural Networks , 2011 , 22 ( 10 ): 1517 - 1531 .
GOMES H M , BIFET A , READ J , et al . Adaptive random forests for evolving data stream classification [J ] . Machine Learning , 2017 , 106 ( 9 ): 1469 - 1495 .
SHAO J M , AHMADI Z , KRAMER S . Prototype-based learning on concept-drifting data streams [C ] // Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . New York : ACM , 2014 : 412 - 421 .
RAAB C , HEUSINGER M , SCHLEIF F M . Reactive soft prototype computing for concept drift streams [J ] . Neurocomputing , 2020 , 416 : 340 - 351 .
RAMÍREZ-GALLEGO S , KRAWCZYK B , GARCÍA S , et al . A survey on data preprocessing for data stream mining: Current status and future directions [J ] . Neurocomputing , 2017 , 239 : 39 - 57 .
ANDREWS D W K . A conditional Kolmogorov test [J ] . Econometrica , 1997 , 65 ( 5 ): 1097 - 1128 .
LIU A J , LU J , SONG Y L , et al . Concept drift detection delay index [J ] . IEEE Transactions on Knowledge and Data Engineering , 2023 , 35 ( 5 ): 4585 - 4597 .
ZUBAROĞLU A , ATALAY V . Data stream clustering: A review [J ] . Artificial Intelligence Review , 2021 , 54 ( 2 ): 1201 - 1236 .
GAO Y , CHANDRA S , LI Y F , et al . SACCOS: A semi-supervised framework for emerging class detection and concept drift adaption over data streams [J ] . IEEE Transactions on Knowledge and Data Engineering , 2022 , 34 ( 3 ): 1416 - 1426 .
XU W , CAI Y X , HE D J , et al . FAST-LIO2: Fast direct LiDAR-inertial odometry [J ] . IEEE Transactions on Robotics , 2022 , 38 ( 4 ): 2053 - 2073 .
BRODER A Z . On the resemblance and containment of documents [C ] // Proceedings . Compression and Complexity of SEQUENCES 1997 . Piscataway : IEEE , 1997: 21 - 29 .
LOSING V , HAMMER B , WERSING H . KNN classifier with self adjusting memory for heterogeneous concept drift [C ] // 2016 IEEE 16th International Conference on Data Mining (ICDM) . Piscataway : IEEE , 2016 : 291 - 300 .
DOS REIS D M , FLACH P , MATWIN S , et al . Fast unsupervised online drift detection using incremental kolmogorov-smirnov test [C ] // Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . New York : ACM , 2016 : 1545 - 1554 .
MANAPRAGADA C , WEBB G I , SALEHI M . Extremely fast decision tree [C ] // Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . New York : ACM , 2018 : 1953 - 1962 .
KE G L , MENG Q , FINLEY T , et al . Lightgbm: A highly efficient gradient boosting decision tree [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach : Curran Associates Inc , 2017 : 3149 - 3157 .
COVER T , HART P . Nearest neighbor pattern classification [J ] . IEEE Transactions on Information Theory , 1967 , 13 ( 1 ): 21 - 27 .
0
浏览量
13
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621