1. 吉林大学计算机科学与技术学院符号计算与知识工程教育部重点实验室,吉林,长春,130012
2. 中国科学院长春光学精密机械与物理研究所,吉林,长春,130033
3. 吉林大学计算机科学与技术学院符号计算与知识工程教育部重点实验室吉林长春,130012
4. 中国科学院长春光学精密机械与物理研究所吉林长春,130033
纸质出版:2011
移动端阅览
李雄飞, 孙涛, 武佳薇. 对象间矢量感应聚类算法[J]. 电子学报, 2011,39(6):1347-1352.
LI Xiong-fei, SUN Tao, WU Jia-wei. Clustering Algorithm Concerning Vector Influence Between Objects[J]. Acta Electronica Sinica, 2011, 39(6): 1347-1352.
从万有引力角度考虑
质点之间相互影响包括距离和方向两个方面.本文讨论数据之间的矢量感应
并将其应用于聚类算法VICA.引入对象的标量感应函数和方向感应函数
提出矢量感应函数概念.并给出确定方向感应函数的两个方法:方向相似度法和累加法.将核心对象邻域中的对象投影
进行向量单位化
考察核心对象的邻域均匀感应程度
将与均匀感应核心对象均匀感应密度可达的对象聚成一个簇.理论分析和实验结果表明
算法可以处理任意形状的簇
有效地排除了稀疏感应对象这类噪声
并且可以解决高维数据聚类边界区分不明显、密度分布不均、类边界噪声对象多的问题
提高了聚类精度.由于感应函数是一个泛化定义
算法具有通用性和可扩展性.将半结构化数据变换到欧式空间时
容易出现边界稀疏对象
算法可以有效处理噪声.因此
算法适用于大规模的高维数据集合
也可用于半结构化数据聚类.
Considering from the law of gravity
the influence between particles includes distance and direction.After discussing the vector influence between data objects
it is applied in clustering algorithm.Vector influence function is presented from the scalar influence function and direction influence function.Two methods—similarity and sum are introduced to compute the direction influence.The algorithm deals with the core point by getting the projection of the points in its neighborhood to judge whether it is uniformity influence.Only uniformity influence points can be expanded to form clusters.The theoretical analysis and experimental results indicate that the algorithm can discover clusters with arbitrary shape and can effectively eliminate noise such as boundary sparse points.It solves the difficulties of clustering high dimensional spatial data such as the spatial distribution of the data
not obvious boundary between clusters
too many noise data points and the phenomenon that the distance between the nearest and farthest neighbors of a data point goes to zero etc.The algorithm improves the accuracy of clustering and offers better results of clustering on various data sets.It executes effectively and efficiently.The algorithm is scalable and general.While transforming the semi-structure data into Euclid space
it will always appear boundary sparse objects
VICA can deal with the noise effectively.Therefore
the algorithm is proper with the high dimension data set
and also can be applied in the semi-structure data clustering.
0
浏览量
975
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621