

浏览全部资源
扫码关注微信
浙江工业大学计算机科学与技术学院,浙江杭州 310023
Received:16 January 2025,
Revised:2025-05-08,
Published:25 June 2025
移动端阅览
陆昊阳, 范玉雷, 高楠, 等. 一种适用数据流概念漂移检测与适应的增量密度聚类算法[J]. 电子学报, 2025, 53(06): 2050-2062.
LU Hao-yang, FAN Yu-lei, GAO Nan, et al. An Incremental Density-Based Clustering Algorithm for Concept Drift Detection and Adaption over Data Stream[J]. Acta Electronica Sinica, 2025, 53(06): 2050-2062.
陆昊阳, 范玉雷, 高楠, 等. 一种适用数据流概念漂移检测与适应的增量密度聚类算法[J]. 电子学报, 2025, 53(06): 2050-2062. DOI:10.12263/DZXB.20250060
LU Hao-yang, FAN Yu-lei, GAO Nan, et al. An Incremental Density-Based Clustering Algorithm for Concept Drift Detection and Adaption over Data Stream[J]. Acta Electronica Sinica, 2025, 53(06): 2050-2062. DOI:10.12263/DZXB.20250060
为处理随时间不断演化、非平稳数据流中的概念漂移问题,本文提出一种适用数据流概念漂移检测和适应的增量密度聚类算法(InCremental Density-based Clustering algorithm,ICDC).ICDC改进了1次遍历聚类框架,采用惰性方式处理离群点,由新达数据触发离群点评估,以区分潜在微簇和噪声;聚类过程中要求数据点和微簇满足特征依赖及时序依赖的条件,有效去除离群点集中的异常值,克服了现有离群点处理方式中因异常点的加入导致类簇结构以不可逆转方式持续恶化的情形;设计了一种离群点生命周期调节机制,有效控制缓存大小的增长;以类簇结构变化作为概念漂移指示器,设计了相应检测算法,提升了增量密度聚类算法对数据流演变过程中局部模式和全局模式变化的敏感性.在多个真实和合成数据集上对数据流聚类质量及聚类性能、概念漂移检测和适应、算法的内存开销和计算开销等方面开展实验,结果表明,该算法在大多数数据集上的聚类结果都优于现有算法,同时能够有效检测概念漂移.
To address concept drift in non-stationary data streams that evolve over time
this paper proposes incremental density-based clustering algorithm (ICDC)
an incremental density-based clustering algorithm designed for concept drift detection and adaptation over data stream. ICDC enhances the one-pass clustering framework by introducing a lazy outlier handling mechanism
where outlier evaluation is triggered by newly arrived data to distinguish between potential micro-clusters and noise. During clustering
data points and micro-clusters must satisfy feature dependency and temporal dependency conditions
effectively filtering outliers from the potential outlier set. This approach prevents irreversible deterioration of cluster structures caused by incorporating outliers—a limitation of existing outlier processing methods. Additionally
ICDC incorporates an outlier life cycle adjustment mechanism to control buffer size growth efficiently. By leveraging cluster structure changes as concept drift indicators
we propose a detection algorithm that enhances ICDC’s sensitivity to local and global pattern shifts during data stream evolution. We evaluate ICDC on multiple real and synthetic dataset
assessing clustering quality
performance
concept drift detection and adaptation
memory overheade
and computational overhead. Experimental results demonstrate that ICDC outperforms existing algorithms on most datasets
achieving superior clustering accuracy and effectively detecting concept drift.
陈志强 , 韩萌 , 李慕航 , 等 . 数据流概念漂移处理方法研究综述 [J ] . 计算机科学 , 2022 , 49 ( 9 ): 14 - 32 .
CHEN Z Q , HAN M , LI M H , et al . Survey of concept drift handling methods in data streams [J ] . Computer Science , 2022 , 49 ( 9 ): 14 - 32 . (in Chinese)
HYDE R , ANGELOV P , MACKENZIE A R . Fully online clustering of evolving data streams into arbitrarily shaped clusters [J ] . Information Sciences , 2017 , 382 : 96 - 114 .
WARES S , ISAACS J , ELYAN E . Data stream mining: Methods and challenges for handling concept drift [J ] . SN Applied Sciences , 2019 , 1 ( 11 ): 1412 .
乔俊飞 , 孙子健 , 汤健 . 面向工业过程软测量建模的概念漂移检测综述 [J ] . 控制理论与应用 , 2021 , 38 ( 8 ): 1159 - 1174 .
QIAO J F , SUN Z J , TANG J . Overview of concept drift detection for industrial process soft sensor modeling [J ] . Control Theory Applications , 2021 , 38 ( 8 ): 1159 - 1174 . (in Chinese)
AGRAHARI S , SINGH A K . Concept drift detection in data stream mining: A literature review [J ] . Journal of King Saud University-Computer and Information Sciences , 2022 , 34 ( 10 ): 9523 - 9540 .
GAMA J , ŽLIOBAITĖ I , BIFET A , et al . A survey on concept drift adaptation [J ] . ACM Computing Surveys , 2014 , 46 ( 4 ): 1 - 37 .
DING S F , WU F L , QIAN J , et al . Research on data stream clustering algorithms [J ] . Artificial Intelligence Review , 2015 , 43 ( 4 ): 593 - 600 .
CAO F , ESTERT M , QIAN W N , et al . Density-based clustering over an evolving data stream with noise [C ] // Proceedings of the 2006 SIAM International Conference on Data Mining . Philadelphia : Society for Industrial and Applied Mathematics , 2006 : 328 - 339 .
LAOHAKIAT S , SA-ING V . An incremental density-based clustering framework using fuzzy local clustering [J ] . Information Sciences , 2021 , 547 : 404 - 426 .
CAO H L , CHU Y H , ZHAO C Y , et al . Software multi-fault localization via Chameleon clustering in parallel [J ] . Journal of King Saud University-Computer and Information Sciences , 2023 , 35 ( 8 ): 101676 .
ESTER M , KRIEGEL H , SANDER J , et al . A density-based algorithm for discovering clusters in large spatial databases with noise [C ] // Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . New York : ACM , 1996 : 226 - 231 .
BACH F . Optimization with sparsity-inducing penalties [J ] . Foundations and Trends in Machine Learning , 2011 , 4 ( 1 ): 1 - 106 .
AGGARWAL C C , YU P S , HAN J W , et al . A framework for clustering evolving data streams [J ] . Proceedings 2003 VLDB Conference , 2003 , 1 : 81 - 92 .
GUO X , ZHU E , LIU X , et al . Deep embedded clustering with data augmentation [C ] // Asian Conference on Machine Learning . New York : ACM , 2018 : 550 - 565 .
SAYED D , RADY S , AREF M . Enhancing CluStream algorithm for clustering big data streaming over sliding window [C ] // 2020 12th International Conference on Electrical Engineering (ICEENG) . Piscataway : IEEE , 2020 : 108 - 114 .
AHSANI S , SANATI M Y , MANSOORIZADEH M . Improvement of CluStream algorithm using sliding window for the clustering of data streams [C ] // 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE) . Piscataway : IEEE , 2021 : 434 - 440 .
BIFET A , GAVALDÀ R . Learning from time-changing data with adaptive windowing [C ] // Proceedings of the 2007 SIAM International Conference on Data Mining . Los Alamitos : Society for Industrial and Applied Mathematics , 2007 : 443 - 448 .
SUREGAONKAR S , ASHOK S D , KARAR V , et al . Change point monitoring for vehicle incident detection at intersection [C ] // 2017 Third International Conference on Science Technology Engineering Management (ICONSTEM) . Piscataway : IEEE , 2017 : 100 - 105 .
FRÍAS-BLANCO I , DEL CAMPO-ÁVILA J , RAMOS-JIMÉNEZ G , et al . Online and non-parametric drift detection methods based on hoeffding’s bounds [J ] . IEEE Transactions on Knowledge and Data Engineering , 2015 , 27 ( 3 ): 810 - 823 .
FRÍAS-BLANCO I , VERDECIA-CABRERA A , ORTIZ-DÍAZ A , et al . Fast adaptive stacking of ensembles [C ] // Proceedings of the 31st Annual ACM Symposium on Applied Computing . New York : ACM , 2016 : 929 - 934 .
KOMORNICZAK J , ZYBLEWSKI P , KSIENIEWICZ P . Statistical drift detection ensemble for batch processing of data streams [J ] . Knowledge-Based Systems , 2022 , 252 : 109380 .
LIU A J , LU J , ZHANG G Q . Concept drift detection via equal intensity k-means space partitioning [J ] . IEEE Transactions on Cybernetics , 2021 , 51 ( 6 ): 3198 - 3211 .
LIU A J , ZHANG G Q , LU J . Fuzzy time windowing for gradual concept drift adaptation [C ] // 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) . Piscataway : IEEE , 2017 : 1 - 6 .
TOGBE M U , CHABCHOUB Y , BOLY A , et al . Anomalies detection using isolation in concept-drifting data streams [J ] . Computers , 2021 , 10 ( 1 ): 13 .
朱颖雯 , 陈松灿 . 基于随机投影的高维数据流聚类 [J ] . 计算机研究与发展 , 2020 , 57 ( 8 ): 1683 - 1696 .
ZHU Y W , CHEN S C . High dimensional data stream clustering algorithm based on random projection [J ] . Journal of Computer Research and Development , 2020 , 57 ( 8 ): 1683 - 1696 . (in Chinese)
韩光洁 , 赵腾飞 , 刘立 , 等 . 基于多元区域集划分的工业数据流概念漂移检测 [J ] . 电子学报 , 2023 , 51 ( 7 ): 1906 - 1916 .
HAN G J , ZHAO T F , LIU L , et al . Concept drift detection of industrial data flow based on multivariate region set partition [J ] . Acta Electronica Sinica , 2023 , 51 ( 7 ): 1906 - 1916 . (in Chinese)
STREHL A . Cluster ensembles-A knowledge reuse framework for combining multiple partitions [J ] . Journal of Machine Learning Research , 2002 , 3 ( 3 ): 583 - 617 .
SUN J R , DU M J , LEW Z , et al . TWStream: Three-way stream clustering [J ] . IEEE Transactions on Fuzzy Systems , 2024 , 32 ( 9 ): 4927 - 4939 .
RAND W M . Objective criteria for the evaluation of clustering methods [J ] . Journal of the American Statistical Association , 1971 , 66 ( 336 ): 846 - 850 .
HUBERT L , ARABIE P . Comparing partitions [J ] . Journal of Classification , 1985 , 2 ( 1 ): 193 - 218 .
KHAMASSI I , SAYED-MOUCHAWEH M , HAMMAMI M , et al . Self-adaptive windowing approach for handling complex concept drift [J ] . Cognitive Computation , 2015 , 7 ( 6 ): 772 - 790 .
WANG Y Z , QIAN J X , HASSAN M , et al . Density peak clustering algorithms: A review on the decade 2014-2023 [J ] . Expert Systems with Applications , 2024 , 238 : 121860 .
TAVALLAEE M , BAGHERI E , LU W , et al . A detailed analysis of the KDD CUP 99 data set [C ] // 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications . Piscataway : IEEE , 2009 : 1 - 6 .
AGUIAR G J , CANO A . A comprehensive analysis of concept drift locality in data streams [J ] . Knowledge-Based Systems , 2024 , 289 : 111535 .
STREET W N , KIM Y . A streaming ensemble algorithm (SEA) for large-scale classification [C ] // Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . New York : ACM , 2001 : 377 - 382 .
PEREIRA D G , AFONSO A , MEDEIROS F M . Overview of Friedman’s test and post-hoc analysis [J ] . Communications in Statistics-Simulation and Computation , 2015 , 44 ( 10 ): 2636 - 2653 .
DEMSAR J . Statistical comparisons of classifiers over multiple data sets [J ] . Journal of Machine Learning Research , 2006 , 7 : 1 - 30 .
0
Views
8
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621