电子学报 ›› 2018, Vol. 46 ›› Issue (11): 2612-2618.DOI: 10.3969/j.issn.0372-2112.2018.11.007

• 学术论文 • 上一篇    下一篇

融合标签平均划分距离和结构关系的微博用户可重叠社区发现

马慧芳1,2, 陈海波1, 赵卫中3, 邴睿1, 黄乐乐1   

  1. 1. 西北师范大学计算机科学与工程学院, 甘肃兰州 730070;
    2. 桂林电子科技大学, 广西可信软件重点实验室, 广西桂林 541004;
    3. 湘潭大学信息工程学院, 湖南湘潭 411105
  • 收稿日期:2017-10-24 修回日期:2018-01-07 出版日期:2018-11-25 发布日期:2018-11-25
  • 作者简介:马慧芳 女,1981年7月出生,甘肃兰州人.博士,硕士生导师,现为西北师范大学计算机科学与工程学院副教授.研究领域为数据挖掘与机器学习.E-mail:mahuifang@yeah.net;陈海波 男,1993年1月出生,山东淄博人.西北师范大学计算机科学与工程学院硕士生.研究方向为机器学习.Email:605423127@QQ.com
  • 基金资助:
    国家自然科学基金(No.61762078,No.61762080);广西可信软件重点实验室研究课题(No.kx201705)

Leveraging Tag Mean Partition Distance and Social Structure for Overlapping Microblog User Community Detection

MA Hui-fang1,2, CHEN Hai-bo1, ZHAO Wei-zhong3, BING Rui1, HUANG Le-le1   

  1. 1. Computer Science and Engineering, Northwest Normal University, Lanzhou, Gansu 730070, China;
    2. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China;
    3. College of Information Engineering, Xiangtan University, Xiangtan, Hunan 411105, China
  • Received:2017-10-24 Revised:2018-01-07 Online:2018-11-25 Published:2018-11-25

摘要: 提出了一种融合标签平均划分距离和结构关系的微博用户可重叠社区发现算法.首先从信息论与距离的概念出发,定义基于核心标签平均划分距离的准划分算法;再根据用户关注关系定义结构属性向量,并计算用户结构相异度,进而对核心标签平均划分距离和用户结构相异度进行权重调节,得到综合划分相异度;最后将综合划分相异度最低的标签所划分出的分组作为本次循环的新社区;实验表明,该方法能够识别可重叠社区且具有实际应用意义.

关键词: 可重叠划分, 核心标签, 平均划分距离, 结构相异度, 综合划分相异度

Abstract: In this paper,a microblog user community detection algorithm via tag mean partition distance and social structure is proposed.Firstly,through the concept of information theory and distance,a community pre-partition algorithm based on the mean partition distance of core tags is established.Furthermore,a structure attribute vector is defined according to the user's following and follower relationships,based on which the user structure dissimilarity is calculated.Then,the comprehensive division dissimilarity is derived by adjusting the weight of mean distance of core tag and user structure dissimilarity.Finally,the subgroup corresponding to the tag with the lowest comprehensive division dissimilarity degree is considered as a new community for one iteration.Experiments show that the proposed method is effective and has practical significance.

Key words: overlapping community detection, core tag, mean partition distances(MPD) structure dissimilarity, comprehensive division dissimilarity(CDS)

中图分类号: