电子学报 ›› 2017, Vol. 45 ›› Issue (4): 769-776.DOI: 10.3969/j.issn.0372-2112.2017.04.001

• 学术论文 •    下一篇

基于核心标签的可重叠微博网络社区划分方法

马慧芳1,2, 谢蒙1, 何廷年1,3, 蔺想红1   

  1. 1. 西北师范大学计算机科学与工程学院, 甘肃兰州 730070;
    2. 中国科学院计算技术研究所智能信息处理重点实验室, 北京 100190;
    3. 北京师范大学信息科学与技术学院, 北京 100875
  • 收稿日期:2016-01-08 修回日期:2016-08-01 出版日期:2017-04-25
    • 作者简介:
    • 马慧芳 女,1981年7月出生,甘肃兰州人.博士,硕士生导师,现为西北师范大学计算机科学与工程学院副教授.研究领域为人工智能、数据挖掘与机器学习.E-mail:mahuifang@yeah.net;谢蒙 男,1990年6月出生,河北邢台人.西北师范大学计算机科学与工程学院硕士.研究方向为:互联网数据挖掘与机器学习.E-mail:xiemengh@hotmail.com
    • 基金资助:
    • 国家自然科学基金 (No.61363058,No.61163039); 甘肃省青年科技基金 (No.145RJYA259,No.1606RJYA269); 甘肃省自然科学研究基金 (No.145RJZA232); 中国科学院计算技术研究所智能信息处理重点实验室开放基金 (No.IIP2014-4)

An Overlapping Microblog Community Detection Algorithm via Core Tags

MA Hui-fang1,2, XIE Meng1, HE Ting-nian1,3, LIN Xiang-hong1   

  1. 1. College of Computer Science and Engineering, Northwest Normal University, Lanzhou, Gansu 730070, China;
    2. Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    3. College of Information Science and Technology, Beijing Normal University, Beijing 100875, China
  • Received:2016-01-08 Revised:2016-08-01 Online:2017-04-25 Published:2017-04-25
    • Supported by:
    • National Natural Science Foundation of China (No.61363058, No.61163039); Youth Science and Technology Fund of Gansu Province (No.145RJYA259, No.1606RJYA269); Natural Science Research Fund of Gansu Province (No.145RJZA232); Open Fund of Key Laboratory of Intelligent Information Processing of Institute of Computing Technology,  Chinese Academy of Sciences (No.IIP2014-4)

摘要:

针对传统微博社区发现算法内聚低重叠度不可控制等问题,以自顶向下的策略,提出一种基于核心标签的可重叠微博社区发现策略Tag Cut.先利用用户标签的共现关系及逆用户频率对标签进行加权,并基于标签之间的内联及外联关系并将用户的标签进行扩充,然后在整体社区中提取包含某一标签的用户作为临时分组并利用评价函数评估划分的优劣,最后选出最合适的核心标签根据其对应分组与其他分组距离的远近来决定将其划分为新的分组还是并入其他分组.用此策略反复迭代直到满足要求.该算法划分的组由若干个拥有核心标签的分组组成且综合利用微博用户已声明的及隐含的兴趣、用户之间的关注规律、结果的实用性对划分结果进行修正.经真实数据实验表明该方法内聚高社区重叠度可控且拥有实际意义.

关键词: 微博网络, 可重叠社区划分, 核心标签, 用户关注关系, 标签划分

Abstract:

The traditional microblog community detection algorithm has the characteristic of low coupled clustering and the overlapping degree can not be controlled.In this paper,we present a divisive approach for overlapping microblog community detection algorithm via core tags.Firstly,the key idea is to develop a tag weighing strategy by taking advantage of the co-occurrence of tags and inverse user frequency.Then tag correlation can be exploited,which investigates both inter and intra correlation of tags,and the tags for users can therefore be expanded.Users containing certain tag in the whole community are extracted as a temporary group and the quality value is calculated under the current partition.The most appropriate core tag is selected and the corresponding group is then updated until certain requirements are satisfied.The community detected by this algorithm share common core tags and the partition results can be revised based on the explicit and implicit interest of users,together with the users' attention and practical application.Experimental results show that the method is effective and has practical significance.

Key words: microblog network, overlapping community detection, core tag, user attention relationship, tag cut

中图分类号: