电子学报 ›› 2022, Vol. 50 ›› Issue (3): 718-725.DOI: 10.12263/DZXB.20201146

• 学术论文 • 上一篇    下一篇

结构α-熵的加权高斯混合模型的子空间聚类

李凯1,2, 张可心1   

  1. 1.河北大学网络空间安全与计算机学院,河北 保定 071002
    2.河北省机器视觉工程研究中心,河北 保定 071002
  • 收稿日期:2020-10-18 修回日期:2021-03-30 出版日期:2022-03-25 发布日期:2022-03-25
  • 作者简介:李 凯 男,1963年出生于河北省保定市.现为河北大学教授,从事机器学习、模式识别与数据挖掘等方面研究工作.E‑mail:likai@hbu.edu.cn
    张可心 女,1994年出生于河北省邯郸市.从事机器学习与数据挖掘等方面研究工作. E‑mail:306432139@qq.com
  • 基金资助:
    河北省自然科学基金(F2018201060)

Structural α-Entropy Weighting Gaussian Mixture Model for Subspace Clustering

LI Kai1,2, ZHANG Ke-xin1   

  1. 1.School of Cyber Security and Computer,Hebei University,Baoding,Hebei 071000,China
    2.Hebei Machine Vision Engineering Research Center,Baoding,Hebei 071000,China
  • Received:2020-10-18 Revised:2021-03-30 Online:2022-03-25 Published:2022-03-25

摘要:

利用信息熵或模糊熵确定子空间聚类中每个簇的不同特征,较好地解决了高维数据的子空间聚类.为了进一步提高聚类算法的性能,将权向量的负结构α-熵引入到高斯混合模型中,获得了结构α-熵的加权高斯混合的子空间聚类模型,提出了结构α-熵的加权高斯混合模型的子空间聚类算法SEWMM(Structural α-Entropy Weighting Mixture Model),该算法不仅可以发现高维数据空间中位于不同子空间的簇,而且能够获得子空间中具有不同形状体积的簇.同时,进一步分析了算法的收敛性与时间复杂性.通过选取UCI(University of California, Irvine) 标准数据集及图像数据集,对提出的算法SEWMM进行了实验,并与一些典型的聚类算法进行了比较,表明了提出的算法在总体性能上具有一定的提升.

关键词: 模糊熵, 结构α-熵, 特征加权, 高斯混合模型, 高维数据, 子空间聚类

Abstract:

Using information entropy or fuzzy entropy to determine the different features of each cluster for subspace clustering, subspace clustering for high dimensional data is solved very well. For further improving performance of clustering algorithm, negative structural α?entropy with weight vector is introduced into the Gaussian mixture model to obtain a structural α?entropy weighting mixture model of subspace clustering. Based on this, the structural α?entropy weighting mixture model subspace clustering algorithm(SEWMM) is derived theoretically, which can not only discover clusters in different subspaces in high dimensional data space, but also can discover clusters with various shape volumes in subspaces. And convergence and time complexity of algorithm are further analyzed. In the experiment, compared with some representative algorithms, the proposed algorithm SEWMM is tested on UCI(University of California, Irvine) standard data sets and image data sets. It shows the proposed algorithm has a certain improvement in the overall performance.

Key words: fuzzy entropy, structural α-entropy, feature weighting, Gaussian mixture model, high-dimensional data, subspace clustering

中图分类号: