

浏览全部资源
扫码关注微信
1.江苏师范大学计算机与科学技术学院,江苏徐州 221116
2.江苏师范大学江苏省高校教育智能技术重点实验室,江苏徐州 221116
Received:08 December 2023,
Revised:2024-05-31,
Published:25 October 2024
移动端阅览
杜明晶, 吴福玉, 李宇蕊, 等. 侵蚀聚类[J]. 电子学报, 2024, 52(10): 3459-3471.
DU Ming-jing, WU Fu-yu, LI Yu-rui, et al. Erosion Clustering[J]. Acta Electronica Sinica, 2024, 52(10): 3459-3471.
杜明晶, 吴福玉, 李宇蕊, 等. 侵蚀聚类[J]. 电子学报, 2024, 52(10): 3459-3471. DOI:10.12263/DZXB.20231146
DU Ming-jing, WU Fu-yu, LI Yu-rui, et al. Erosion Clustering[J]. Acta Electronica Sinica, 2024, 52(10): 3459-3471. DOI:10.12263/DZXB.20231146
基于密度的聚类是一种经典的聚类分析方法,它能够在不指定类簇数目的情况下发现非球形类簇.但真实复杂数据集中存在类簇边界模糊、数据密度不均、数据分布复杂等问题.当前,能够同时应对这三种问题的研究工作相对较少.对此,本文从自然世界的侵蚀现象中汲取灵感,提出侵蚀聚类(Erosion Clustering,EC)算法.本算法引入动态密度估计方法和侵蚀策略,逐层识别和剔除位于类簇边界上的数据,进而发现各个类簇潜在的核心区域;采用基于互可达图的聚类方法实现核心区域的聚类;设计基于局部密度峰值的分配方式完成边界数据的划分.在12个基准数据集上的实验结果表明,EC算法的聚类性能比7种对比算法分别在修正兰德指标、修正互信息、
F
1
分数上平均提高了96%、53%和36%.
Density-based clustering is a classical algorithm in cluster analysis
which can find non-spherical clusters without specifying the number of clusters in advance. In the real-world scene
there are still some issues
including unclear boundaries between clusters
varying densities of data
and complex cluster shapes. Most existing density-based clustering algorithms do not tackle these problems in a unified way. We counter this difficulty by taking inspiration from the natural erosion phenomenon to present erosion clustering (EC). Firstly
the proposed dynamic density evaluation method is integrated into the erosion strategy
which identifies and removes the data on the cluster boundary layer by layer
revealing the cores of the latent clusters. After that
a mutual-reachability-graph-based clustering is used to group the core data. Finally
the allocation strategy based on the local density peak is designed to associate the eroded data to different clusters. The experimental results on 12 benchmark datasets demonstrate that the clustering performance of the proposed EC algrithm is improved by an average of 96%
53%
and 36% in the adjusted Rand index
adjusted mutual information
and
F
1
score
respectively
comparing with the other seven algrithms.
HOTELLING H . A Generalized T Test and measure of multivariate dispersion [M ] // Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability . Berkeley : University of California Press , 1951 : 23 - 42 .
XU L , JORDAN M I . On convergence properties of the EM algorithm for Gaussian mixtures [J ] . Neural Computation , 1996 , 8 ( 1 ): 129 - 151 .
SHI J B , MALIK J . Normalized cuts and image segmentation [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2000 , 22 ( 8 ): 888 - 905 .
BEN-HUR A , HORN D , SIEGELMANN H T , et al . Support vector clustering [J ] . Journal of Machine Learning Research , 2001 , 2(Dec): 125- 137 .
WANG C D , LAI J H . Position regularized support vector domain description [J ] . Pattern Recognition , 2013 , 46 ( 3 ): 875 - 884 .
PENG C , ZHANG Q , KANG Z , et al . Kernel two-dimensional ridge regression for subspace clustering [J ] . Pattern Recognition , 2021 , 113 : 107749 .
PENG C , ZHANG Y Q , CHEN Y Y , et al . Log-based sparse nonnegative matrix factorization for data representation [J ] . Knowledge-Based Systems , 2022 , 251 : 109127 .
PENG C , ZHANG J , CHEN Y Y , et al . Preserving bilateral view structural information for subspace clustering [J ] . Knowledge-Based Systems , 2022 , 258 : 109915 .
ESTER M , H-P KRIEGEL , SANDER J , et al . A density-based algorithm for discovering clusters in large spatial databases with noise [C ] // Proceedings of the 2nd ACM International Conference on Knowledge Discovery and Data Mining . New York : ACM , 1996 : 226 - 231 .
CHENG Y Z . Mean shift, mode seeking, and clustering [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 1995 , 17 ( 8 ): 790 - 799 .
RODRIGUEZ A , LAIO A . Clustering by fast search and find of density peaks [J ] . Science , 2014 , 344 ( 6191 ): 1492 - 1496 .
CHOWDHURY S , NA H L , CORDEIRO DE AMORIM R . Feature weighting in DBSCAN using reverse nearest neighbours [J ] . Pattern Recognition , 2023 , 137 : 109314 .
DING S F , DU W , XU X , et al . An improved density peaks clustering algorithm based on natural neighbor with a merging strategy [J ] . Information Sciences , 2023 , 624 : 252 - 276 .
SUN C , DU M J , SUN J R , et al . A three-way clustering method based on improved density peaks algorithm and boundary detection graph [J ] . International Journal of Approximate Reasoning , 2023 , 153 : 239 - 257 .
ANKERST M , BREUNIG M M , KRIEGEL H P , et al . OPTICS: Ordering points to identify the clustering structure [C ] // Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data . New York : ACM , 1999 : 49 - 60 .
CAMPELLO R J G B , MOULAVI D , SANDER J . Density-based clustering based on hierarchical density estimates [M ] // Lecture Notes in Computer Science . Berlin : Springer , 2013 : 160 - 172 .
QIAN L , PLANT C , BOHM C . Density-based clustering for adaptive density variation [C ] // 2021 IEEE International Conference on Data Mining (ICDM) . Piscataway : IEEE , 2021 : 1282 - 1287 .
徐晓 , 丁世飞 , 丁玲 . 密度峰值聚类算法研究进展 [J ] . 软件学报 , 2022 , 33 ( 5 ): 1800 - 1816 .
XU X , DING S F , DING L . Survey on density peaks clustering algorithm [J ] . Journal of Software , 2022 , 33 ( 5 ): 1800 - 1816 . (in Chinese)
DU M J , DING S F , JIA H J . Study on density peaks clustering based on k -nearest neighbors and principal component analysis [J ] . Knowledge-Based Systems , 2016 , 99 : 135 - 145 .
LOTFI A , MORADI P , BEIGY H . Density peaks clustering based on density backbone and fuzzy neighborhood [J ] . Pattern Recognition , 2020 , 107 : 107449 .
GUO W J , WANG W H , ZHAO S P , et al . Density Peak Clustering with connectivity estimation [J ] . Knowledge-Based Systems , 2022 , 243 : 108501 .
WANG Y Z , WANG D , ZHANG X F , et al . McDPC: Multi-center density peak clustering [J ] . Neural Computing and Applications , 2020 , 32 ( 17 ): 13465 - 13478 .
GUAN J Y , LI S , HE X X , et al . Clustering by fast detection of main density peaks within a peak digraph [J ] . Information Sciences , 2023 , 628 : 504 - 521 .
GUAN J Y , LI S , CHEN X J , et al . DEMOS: Clustering by pruning a density-boosting cluster tree of density mounts [J ] . IEEE Transactions on Knowledge and Data Engineering , 2023 , 35 ( 10 ): 10814 - 10830 .
CHEN B , TING K M , WASHIO T , et al . Local contrast as an effective means to robust clustering against varying densities [J ] . Machine Learning , 2018 , 107 ( 8 ): 1621 - 1645 .
CHEN J G , YU P S . A domain adaptive density clustering algorithm for data with varying density distribution [J ] . IEEE Transactions on Knowledge and Data Engineering , 2021 , 33 ( 6 ): 2310 - 2321 .
WANG Y Z , WANG D , ZHOU Y , et al . VDPC: Variational density peak clustering algorithm [J ] . Information Sciences , 2023 , 621 : 627 - 651 .
LI R J , YANG X F , QIN X L , et al . Local gap density for clustering high-dimensional data with varying densities [J ] . Knowledge-Based Systems , 2019 , 184 : 104905 .
AVERBUCH-ELOR H , BAR N , COHEN-OR D . Border-peeling clustering [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 7 ): 1791 - 1797 .
DING S , DU M , SUN T , et al . An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood [J ] . Knowledge-Based Systems , 2017 , 133 : 294 - 313 .
0
Views
4
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621