1. 沈阳工业大学信息科学与工程学院,辽宁,沈阳,110870
2. 河北大学计算机科学与技术学院,河北,保定,071002
3. 沈阳工业大学信息科学与工程学院,辽宁,沈阳,110870
4. 河北大学计算机科学与技术学院,河北,保定,071002
网络出版:2018-03-25,
纸质出版:2018
移动端阅览
卢晶, 段勇, 刘海波. 基于z值的分布式密度峰值聚类算法[J]. 电子学报, 2018,46(3):730-738.
LU Jing, DUAN Yong, LIU Hai-bo. Distributed Density Peaks Clustering Based on z-Value[J]. Acta Electronica Sinica, 2018, 46(3): 730-738.
卢晶, 段勇, 刘海波. 基于z值的分布式密度峰值聚类算法[J]. 电子学报, 2018,46(3):730-738. DOI: 10.3969/j.issn.0372-2112.2018.03.031.
LU Jing, DUAN Yong, LIU Hai-bo. Distributed Density Peaks Clustering Based on z-Value[J]. Acta Electronica Sinica, 2018, 46(3): 730-738. DOI: 10.3969/j.issn.0372-2112.2018.03.031.
密度峰值聚类算法由于在发现任意形状簇且不需指定聚类个数等方面具有一定的优势而被广泛关注.但是该算法需要计算数据集中所有点的密度和点对之间的距离,因此不适合处理大规模高维数据集.为此,本文提出了一种基于z值的分布式密度峰值聚类算法,DP-z.本方法利用空间z填充曲线将高维数据集映射到一维空间上,根据数据点的z值信息对数据集分组.为了能够得到正确的结果,需要对分组间数据进行交互,然后并行计算每个点密度和斥群值.DP-z算法在分组间数据交互时采用过滤策略,减少大量无效距离计算和数据传输开销,有效提高算法的执行效率.最后,本文在云计算平台上对DP-z算法进行了验证,实验表明在保证DP-z算法与原始密度峰值聚类算法聚类结果相同的情况下有效的提高了算法执行效率.
Density peak clustering is an effective and novel clustering algorithm
it is concerned as its superiority of finding arbitrary shape of clusters and number of clusters. However
this algorithm is required to measure the density and distance between any pair of objects. This limits the practicability of this algorithm when clustering high-volume and high-dimensional data set. In order to improve efficiency and scalability
we propose a distributed density peak clustering algorithm based on z-value
and DP-z. It utilizes z-values to map points in multidimensional space into one dimension
and then splits the data set into several partitions according to the z-values of points. In order to get the correct result
we make use of the character of points' z-values to filter the data object while exchanging data among groups
which reduces a huge amount of useless distance measurement cost and data shuffle cost. Then we compute the density and distance value in parallel. Finally
we test the DP-z algorithm based on the cloud computing platform
the experiments show that DP-z can achieve higher performance at speed without reducing the accuracy.
0
浏览量
308
下载量
2
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621