南昌工程学院信息工程学院,江西南昌 330099
[ "赵 嘉 男, 1981年9月出生于江西省九江市. 现为南昌工程学院教授、 硕士生导师. 主要研究方向为机器学习、 数据挖掘和智能计算.E-mail: zhaojia925@163.com" ]
[ "王 刚 男, 1995年8月出生于江西省赣州市. 现为南昌工程学院在读硕士研究生. 主要研究方向为机器学习和数据挖掘.E-mail: wang691630202@163.com" ]
[ "吕 莉 女, 1982年5月出生于江西省贵溪市. 现为南昌工程学院教授、 硕士生导师. 主要研究方向为大数据分析和目标跟踪. E-mail: lvli@nit.edu.cn" ]
[ "樊棠怀 男, 1962年11月出生于江西省九江市. 现为南昌工程学教授、 硕士生导师. 主要研究方向为传感器信息获取与处理、 机器学习和数据挖掘. E-mail: fantanghuai@163.com" ]
收稿:2021-09-17,
修回:2022-01-01,
纸质出版:2022-11-25
移动端阅览
赵嘉,王刚,吕莉等.面向流形数据的测地距离与余弦互逆近邻密度峰值聚类算法[J].电子学报,2022,50(11):2730-2737.
ZHAO Jia,WANG Gang,LÜ Li,et al.Density Peaks Clustering Algorithm Based on Geodesic Distance and Cosine Mutual Reverse Nearest Neighbors for Manifold Datasets[J].ACTA ELECTRONICA SINICA,2022,50(11):2730-2737.
赵嘉,王刚,吕莉等.面向流形数据的测地距离与余弦互逆近邻密度峰值聚类算法[J].电子学报,2022,50(11):2730-2737. DOI: 10.12263/DZXB.20211273.
ZHAO Jia,WANG Gang,LÜ Li,et al.Density Peaks Clustering Algorithm Based on Geodesic Distance and Cosine Mutual Reverse Nearest Neighbors for Manifold Datasets[J].ACTA ELECTRONICA SINICA,2022,50(11):2730-2737. DOI: 10.12263/DZXB.20211273.
密度峰值聚类算法倾向在球形分布数据中选择密度峰值,而流形数据多呈非球形分布,导致不能准确找到数据的类簇中心.该算法的分配策略优先对类簇中心附近的样本进行链式分配,而流形数据大量样本远离其类簇中心,导致本应属于同一类簇的样本被错误分配.为此,本文提出一种面向流形数据的测地距离与余弦互逆近邻密度峰值聚类算法.将
K
近邻与测地距离结合并重新定义局部密度,凸显密度峰值与非密度峰值的差异,准确找到类簇中心;将互逆近邻和余弦相似性相结合,得到基于余弦互逆近邻的样本相似度矩阵,为流形类簇准确分配样本.实验结果表明,本算法能有效发现流形数据集的几何形状并准确聚类,对真实数据集和图像数据集的聚类效果优秀.
The density peaks clustering algorithm tends to select the density peaks in the spherical distribution data
while the manifold data are mostly non spherical distribution
resulting in the inability to accurately find the cluster centers. The allocation strategy of the algorithm gives priority to the chain allocation of samples near the cluster centers
while a large number of samples of manifold data are far away from the cluster centers
resulting in the wrong allocation of samples that should belong to the same cluster. Therefore
this paper proposes a density peaks clustering algorithm based on geodesic distance and cosine mutual reverse nearest neighbors for manifold datasets. Combining
K
-nearest neighbors with geodesic distance and redefining local density
highlighting the difference between density peaks and non density peaks
accurately find the cluster centers; combining the mutual reverse nearest neighbors and cosine similarity
the sample similarity matrix based on cosine mutual reverse nearest neighbors is obtained
which can accurately allocate samples for manifold clusters. The experimental results show that the algorithm can effectively find the geometry structure of manifold datasets
and has excellent clustering effect on real datasets and picture datasets.
LIU R , WANG H , YU X M . Shared nearest neighbor based clustering by fast search and find of density peaks [J]. Information Sciences , 2018 , 450 : 200 ‑ 226 .
DUAN X Y , LIU Y N , WANG X B . SDN enabled 5G-V ANET: Adaptive vehicle clustering and beam formed transmission for aggregated traffic [J]. IEEE Communications Magazine , 2017 , 55 ( 7 ): 120 ‑ 127 .
YOUCEF D , ASMA B , PHILIPPE F V , et al . Fast and effective cluster based information retrieval using frequent closed itemsets [J]. Information Sciences , 2018 , 453 : 154 ‑ 167 .
CARCILLO F , BORGNE Y A L , CAELEN O , et al . Combining unsupervised and supervised learning in credit card fraud detection [J]. Information Sciences , 2021 , 557 : 317 ‑ 331 .
HU Z L , TANG J S , WANG Z M , et al . Deep learning for image based cancer detection and diagnosis-A survey [J]. Pattern Recognition , 2018 , 83 : 134 ‑ 149 .
ZHAO W L , DENG C H , NGO C W . K -means: a revisit [J]. Neurocomputing , 2018 , 291 : 195 ‑ 206 .
KARYPIS G , HAN E H , KUMAR V . Chameleon: hierarchical clustering using dynamic modeling [J]. Computer , 1999 , 32 ( 8 ): 68 ‑ 75 .
GUHA S , RASTOGI R , SHIM K . CURE: An efficient clustering algorithm for large databases [J]. Information Systems , 2001 , 26 ( 1 ): 35 ‑ 58 .
DEMPSTER A P , LAIRD N M , RUBIN D B . Maximum likelihood from incomplete data via the EM algorithm [J]. Journal of the Royal Statistical Society , 1977 , 39 ( 1 ): 1 ‑ 22 .
RODRIGUEZ A , LAIO A . Clustering by fast search and find of density peaks [J]. Science , 2014 , 344 ( 6191 ): 1492 ‑ 1496 .
DU M J , DING S F , XU X , et al . Density peaks clustering using geodesic distances [J]. International Journal of Machine Learning and Cybernetics , 2017 , 9 ( 8 ): 1335 ‑ 1349 .
SUN L L , CHEN G Q , XIONG H , et al . Cluster analysis in data-driven management and decisions [J]. Journal of Management Science and Engineering , 2017 , 2 ( 4 ): 227 ‑ 251 .
CHENG Y Z . Mean shift, mode seeking, and clustering [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 1995 , 17 ( 8 ): 790 ‑ 799 .
DU M J , DING S F , XUE Y . A robust density peaks clustering algorithm using fuzzy neighborhood [J]. International Journal of Machine Learning and Cybernetics , 2018 , 9 ( 7 ): 1131 ‑ 1140 .
YU D H , LIU G J , GUO M Z , et al . Density peaks clustering based on weighted local density sequence and nearest neighbor assignment [J]. IEEE Access , 2019 , 7 : 3 4301‑ 34317 .
XIE J Y , GAO H C , XIE W X , et al . Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K -nearest neighbors [J]. Information Sciences , 2016 , 354 : 19 ‑ 40 .
ZHAO J , TANG J J , SHI A Y , et al . Improved density peaks clustering based on firefly algorithm [J]. International Journal of Bio-Inspired Computation , 2020 , 15 ( 1 ): 24 ‑ 42 .
XU X , DING S F , WANG L J , et al . A robust density peaks clustering algorithm with density-sensitive similarity [J]. Knowledge-Based Systems , 2020 , 200 : 106028
NGUYEN X V , JULIEN E , JAMES B . Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance [J]. Journal of Machine Learning Research , 2010 , 11 ( 1 ): 2837 ‑ 2854 .
FOWLKES E B , MALLOWS C L . A method for comparing two hierarchical clusterings [J]. Journal of the American Statistical Association , 1983 , 78 ( 383 ): 553 ‑ 569 .
0
浏览量
8
下载量
9
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621