陕西师范大学计算机科学学院,陕西,西安,710062
网络出版:2019-05-25,
纸质出版:2019
移动端阅览
谢娟英, 丁丽娟. 完全自适应的谱聚类算法[J]. 电子学报, 2019,47(5):1000-1008.
XIE Juan-ying, DING Li-juan. The True Self-adaptive Spectral Clustering Algorithms[J]. Acta Electronica Sinica, 2019, 47(5): 1000-1008.
谢娟英, 丁丽娟. 完全自适应的谱聚类算法[J]. 电子学报, 2019,47(5):1000-1008. DOI: 10.3969/j.issn.0372-2112.2019.05.004.
XIE Juan-ying, DING Li-juan. The True Self-adaptive Spectral Clustering Algorithms[J]. Acta Electronica Sinica, 2019, 47(5): 1000-1008. DOI: 10.3969/j.issn.0372-2112.2019.05.004.
针对谱聚类算法self-tuning的局部尺度参数
i
会受噪音点影响,进而影响聚类结果,及其所使用的K-means算法的不稳定,对聚类结果的影响,提出两种完全自适应的谱聚类算法SC_SD(Spectral Clustering based on Standard Deviation)和SC_MD(Spectral Clustering based on Mean Distance),分别定义样本
i
的标准差、样本
i
到其余样本的距离均值,为样本
i
的邻域半径,统计邻域内的样本数,以样本
i
的邻域标准差为其局部尺度参数,避免样本
i的
局部尺度参数受噪音点影响,进而影响聚类结果;以方差优化初始聚类中心的SD_K-medoids算法代替K-means算法,克服K-means算法的不稳定,发现数据的真实分布.UCI数据集和人工数据集实验测试表明,提出的SC_SD和SC_MD算法能得到更优聚类结果,不受噪音点影响,有很好的伸缩性.提出的SC_SD和SC_MD能完全自适应地发现数据集的真实分布信息,尤其SC_MD算法很适合较大规模数据集的聚类分析.
To avoid the clustering results with the local scaling parameter
i
of self-tuning may be influenced by outliers
and the unstable clustering results from K-means in self-tuning
two true self-adaptive spectral clustering algorithms were proposed.The two spectral clustering algorithms are respectively named as SC_SD(Spectral Clustering based on Standard Deviation) and SC_MD(Spectral Clustering based on Mean Distance).They respectively define the standard deviation of point
i
and the mean distance from point
i
to others
as its radius of neighborhood
then count the number of points in the neighborhood
and use the standard deviation of point
i
in the neighborhood as its local scaling parameter
so as to avoid the influence from outliers to the local scaling parameter
i
of point
i
and the distortion in clustering results of self-tuning.SD_K-medoids are adopted to instead of K-means in self-tuning to avoid the unstable clustering results of K-means
so as to get the true clustering of a dataset.The experimental results on UCI datasets and on synthetic datasets demonstrate that SC_SD and SC_MD can obtain better clustering results than that of traditional spectral clustering algorithm NJW and spectral clustering algorithm self-tuning
and are robust to noises
and has got good scalability.The proposed SC_SD and SC_MD can detect the clustering of a dataset without any given information
and the SC_MD can be used to detect the clustering of a comparable big data.
0
浏览量
414
下载量
14
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621