

浏览全部资源
扫码关注微信
1.山西大学计算机与信息技术学院,山西太原 030006
2.山西大学计算智能与中文信息处理教育部重点实验室,山西太原 030006
Received:25 May 2025,
Accepted:11 October 2025,
Published:25 October 2025
移动端阅览
苏睿, 郭虎升, 王婧, 等. 面向开放特征空间的概念演化检测方法[J]. 电子学报, 2025, 53(10): 3718-3729.
SU Rui, GUO Hu-sheng, WANG Jing, et al. Concept Evolution Detection Method for Open Feature Space[J]. Acta Electronica Sinica, 2025, 53(10): 3718-3729.
苏睿, 郭虎升, 王婧, 等. 面向开放特征空间的概念演化检测方法[J]. 电子学报, 2025, 53(10): 3718-3729. DOI:10.12263/DZXB.20250416
SU Rui, GUO Hu-sheng, WANG Jing, et al. Concept Evolution Detection Method for Open Feature Space[J]. Acta Electronica Sinica, 2025, 53(10): 3718-3729. DOI:10.12263/DZXB.20250416
在众多现实场景中数据以流的形式持续产生,由于流数据具有动态变化的特点,在生成过程中可能产生新的类别,也被称为概念演化.概念演化是流数据挖掘模型预测性能衰退甚至预测失效的主要原因.因此,能及时发现类空间变化并提醒模型做出适应性调节的概念演化检测方法受到广泛关注.然而,目前多数概念演化检测方法基于特征空间静态不变的假设构建算法.在现实场景中,特征空间同样具有动态性,属于开放的空间.具体来说,随时间推移可能出现部分特征消失和新特征产生的现象,从而破坏上述假设并导致已有算法失效.针对这一问题,本文提出一种面向开放特征空间的概念演化检测方法(Concept evolution Detection method for Open Feature space,CD_OF).该方法通过构建微簇集成模型对新进实例分类,对于开放特征空间中的旧特征消失问题,通过利用转移矩阵将旧特征中包含的信息转换到共享特征中;对于新出现的特征,拓展共享特征空间并重构集成模型.在此基础上,根据样本的共享邻域信息定义样本间相似度以检测概念演化,并建立动态衰减模型,以解决开放特征空间下的类消失和类循环问题.实验结果表明,本文所提出的方法能够对开放特征空间中特征的变化作出及时响应,增强概念演化检测的能力,在特征空间变化的真实流数据中与现有方法相比,错误率降低了1.7%~11.4%.
In many real-world scenarios
data is continuously generated in the form of streams. Due to the dynamic nature of streaming data
new categories may emerge during the generation process
which is known as concept evolution. Concept evolution is one of the primary challenges leading to the degradation or even failure of predictive performance in stream mining models. Therefore
concept evolution detection methods capable of promptly identifying changes in the class space and alerting models to perform adaptive adjustments have attracted widespread attention. However
most of the current concept evolution detection methods construct algorithms based on the assumption that the feature space is static and unchanging. In real scenarios
the feature space is also dynamic and belongs to the open space. Specifically
over time
some features may disappear and new features may emerge
thus violating the above assumption and causing existing algorithms to fail. To address this problem
this paper proposes a concept evolution detection method for open feature space (CD_OF). The method constructs a micro-cluster ensemble model to classify incoming instances. For the problem of disappearing old features in the open feature space
the information contained in the old features is converted to the shared features through the transfer matrix; for the newly emerged features
the shared feature space is expanded and the integration model is reconstructed. On this basis
the inter-sample similarity is defined based on the shared neighborhood information of the samples to detect concept evolution
and the dynamic decay model is established to solve the class vanishing and classifications cycling problems under the open feature space. The experimental results show that the method proposed in this paper is able to respond to the changes of features in the open feature space in a timely manner and enhance the ability of concept evolution detection. The error rate is reduced by 1.7% to 11.4% compared to existing methods on real streaming data with feature space variations.
RAMZAN F , AYYAZ M . A comprehensive review on data stream mining techniques for data classification; and future trends [J ] . EPH - International Journal of Science and Engineering , 2023 , 9 ( 3 ): 1 - 29 .
LU J , LIU A J , DONG F , et al . Learning under concept drift: A review [J ] . IEEE Transactions on Knowledge and Data Engineering , 2019 , 31 ( 12 ): 2346 - 2363 .
LI J P , YU H , ZHANG Z Y , et al . Concept drift adaptation by exploiting drift type [J ] . ACM Transactions on Knowledge Discovery from Data , 2024 , 18 ( 4 ): 1 - 22 .
杜航原 , 王文剑 , 白亮 . 一种基于优化模型的演化数据流聚类方法 [J ] . 中国科学: 信息科学 , 2017 , 47 ( 11 ): 1464 - 1482 .
DU H Y , WANG W J , BAI L . A novel evolving data stream clustering method based on optimization model [J ] . Scientia Sinica (Informationis) , 2017 , 47 ( 11 ): 1464 - 1482 . (in Chinese)
KSIENIEWICZ P , ZYBLEWSKI P . Stream-learn: Open-source Python library for difficult data stream batch analysis [J ] . Neurocomputing , 2022 , 478 : 11 - 21 .
CACCIARELLI D , KULAHCI M . Active learning for data streams: A survey [J ] . Machine Learning , 2024 , 113 ( 1 ): 185 - 239 .
翟婷婷 , 高阳 , 朱俊武 . 面向流数据分类的在线学习综述 [J ] . 软件学报 , 2020 , 31 ( 4 ): 912 - 931 .
ZHAI T T , GAO Y , ZHU J W . Survey of online learning algorithms for streaming data classification [J ] . Journal of Software , 2020 , 31 ( 4 ): 912 - 931 . (in Chinese)
KLIKOWSKI J . Concept drift detector based on centroid distance analysis [C ] // 2022 International Joint Conference on Neural Networks . Piscataway : IEEE , 2022 : 1 - 8 .
LI X J , ZHOU Y , JIN Z Y , et al . A classification and novel class detection algorithm for concept drift data stream based on the cohesiveness and separation index of mahalanobis distance [J ] . Journal of Electrical and Computer Engineering , 2020 , 2020 ( 1 ): 4027423 .
韩光洁 , 赵腾飞 , 刘立 , 等 . 基于多元区域集划分的工业数据流概念漂移检测 [J ] . 电子学报 , 2023 , 51 ( 7 ): 1906 - 1916 .
HAN G J , ZHAO T F , LIU L , et al . Concept drift detection of industrial data flow based on multivariate region set partition [J ] . Acta Electronica Sinica , 2023 , 51 ( 7 ): 1906 - 1916 . (in Chinese)
代劲 , 李昊 , 王国胤 . 基于动态样本选择的概念漂移自适应预测方法 [J ] . 电子学报 , 2024 , 52 ( 9 ): 3228 - 3239 .
DAI J , LI H , WANG G Y . Concept drift adaptive prediction method based on dynamic sample selection [J ] . Acta Electronica Sinica , 2024 , 52 ( 9 ): 3228 - 3239 . (in Chinese)
ZUBAROĞLU A , ATALAY V . Online embedding and clustering of evolving data streams [J ] . Statistical Analysis and Data Mining: the ASA Data Science Journal , 2023 , 16 ( 1 ): 29 - 44 .
王婧 . 基于集成学习的概念演化检测方法研究 [D ] . 太原 : 山西大学 , 2024 .
WANG J . Research on Concept Evolution Detection Method Based on Ensemble Learning [D ] . Taiyuan : Shanxi University , 2024 . (in Chinese)
MASUD M , GAO J , KHAN L , et al . Classification and novel class detection in concept-drifting data streams under time constraints [J ] . IEEE Transactions on Knowledge and Data Engineering , 2011 , 23 ( 6 ): 859 - 874 .
王婧 , 郭虎升 , 王文剑 . 基于弱监督集成的概念演化自适应检测方法 [J ] . 吉林大学学报(信息科学版) , 2024 , 42 ( 3 ): 406 - 420 .
WANG J , GUO H S , WANG W J . Adaptive detection method for concept evolution based on weakly supervised ensemble [J ] . Journal of Jilin University (Information Science Edition) , 2024 , 42 ( 3 ): 406 - 420 . (in Chinese)
GUO H S , XIA H S , LI H , et al . Concept evolution detection based on noise reduction soft boundary [J ] . Information Sciences , 2023 , 628 : 391 - 408 .
DIN S U , SHAO J M . Exploiting evolving micro-clusters for data stream classification with emerging class detection [J ] . Information Sciences , 2020 , 507 : 404 - 420 .
GARCIA K D , DE FARIA E R , DE SÁ C R , et al . Ensemble clustering for novelty detection in data streams [C ] // Discovery Science . Cham : Springer , 2019 : 460 - 470 .
ZHENG X L , LI P P , HU X G , et al . Semi-supervised classification on data streams with recurring concept drift and concept evolution [J ] . Knowledge-Based Systems , 2021 , 215 : 106749 .
AL-KHATEEB T , MASUD M M , AL-NAAMI K M , et al . Recurring and novel class detection using class-based ensemble for evolving data stream [J ] . IEEE Transactions on Knowledge and Data Engineering , 2016 , 28 ( 10 ): 2752 - 2764 .
GAO Y , CHANDRA S , LI Y F , et al . SACCOS: A semi-supervised framework for emerging class detection and concept drift adaption over data streams [J ] . IEEE Transactions on Knowledge and Data Engineering , 2022 , 34 ( 3 ): 1416 - 1426 .
BOUGUELIA M R , NOWACZYK S , PAYBERAH A H . An adaptive algorithm for anomaly and novelty detection in evolving data streams [J ] . Data Mining and Knowledge Discovery , 2018 , 32 ( 6 ): 1597 - 1633 .
MU X , TING K M , ZHOU Z H . Classification under streaming emerging new classes: A solution using completely-random trees [J ] . IEEE Transactions on Knowledge and Data Engineering , 2017 , 29 ( 8 ): 1605 - 1618 .
GANDHI J , GANDHI V . Novel class detection with concept drift in data stream - AhtNODE [J ] . International Journal of Distributed Systems and Technologies , 2020 , 11 ( 1 ): 15 - 26 .
赵鹏 , 周志华 . 基于决策树模型重用的分布变化流数据学习 [J ] . 中国科学: 信息科学 , 2021 , 51 ( 1 ): 1 - 12 .
ZHAO P , ZHOU Z H . Learning from distribution-changing data streams via decision tree model reuse [J ] . Scientia Sinica (Informationis) , 2021 , 51 ( 1 ): 1 - 12 . (in Chinese)
ZHANG Z L , LI Y , ZHANG Z W , et al . Adaptive matrix sketching and clustering for semisupervised incremental learning [J ] . IEEE Signal Processing Letters , 2018 , 25 ( 7 ): 1069 - 1073 .
GUAN S U , LI S C . Incremental learning with respect to new incoming input attributes [J ] . Neural Processing Letters , 2001 , 14 ( 3 ): 241 - 260 .
HOU B J , ZHANG L J , ZHOU Z H . Learning with feature evolvable streams [J ] . IEEE Transactions on Knowledge and Data Engineering , 2021 , 33 ( 6 ): 2602 - 2615 .
刘兆清 , 古仕林 , 侯臣平 . 面向特征继承性增减的在线分类算法 [J ] . 计算机研究与发展 , 2022 , 59 ( 8 ): 1668 - 1682 .
LIU Z Q , GU S L , HOU C P . Online classification algorithm with feature inheritably increasing and decreasing [J ] . Journal of Computer Research and Development , 2022 , 59 ( 8 ): 1668 - 1682 . (in Chinese)
HE Y , WU B J , WU D , et al . Online learning from capricious data streams: A generative approach [C ] // Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence . International Joint Conferences on Artificial Intelligence Organization , 2019 : 2491 - 2497 .
LIAO G B , ZHANG P , YIN H P , et al . A novel semi-supervised classification approach for evolving data streams [J ] . Expert Systems with Applications , 2023 , 215 : 119273 .
LIU R , WANG H , YU X M . Shared-nearest-neighbor-based clustering by fast search and find of density peaks [J ] . Information Sciences , 2018 , 450 : 200 - 226 .
UD DIN S , SHAO J M , KUMAR J , et al . Online reliable semi-supervised learning on evolving data streams [J ] . Information Sciences , 2020 , 525 : 153 - 171 .
DE STEFANO C , MANIACI M , FONTANELLA F , et al . Reliable writer identification in medieval manuscripts through page layout features: The “Avila” Bible case [J ] . Engineering Applications of Artificial Intelligence , 2018 , 72 : 99 - 110 .
ZAREMOODI P , KAMALI SIAHROUDI S , BEIGY H . Concept-evolution detection in non-stationary data streams: A fuzzy clustering approach [J ] . Knowledge and Information Systems , 2019 , 60 ( 3 ): 1329 - 1352 .
0
Views
3
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621