电子学报 ›› 2019, Vol. 47 ›› Issue (1): 145-152.DOI: 10.3969/j.issn.0372-2112.2019.01.019

所属专题: 优秀论文(2022)

• 学术论文 • 上一篇    下一篇

基于结构紧密性的重叠社区发现算法

潘剑飞1,2, 董一鸿1, 陈华辉1, 钱江波1, 戴明洋2   

  1. 1. 宁波大学信息科学与工程学院, 浙江宁波 315211;
    2. 北京百度在线科技有限公司, 北京 100084
  • 收稿日期:2018-01-29 修回日期:2018-08-01 出版日期:2019-01-25
    • 通讯作者:
    • 董一鸿
    • 作者简介:
    • 潘剑飞 男,1991年出生,宁波大学信息科学与工程学院硕士生,百度算法工程师,主要研究方向为大数据、数据挖掘.
    • 基金资助:
    • 国家自然科学基金 (No.61572266,No.61472194); 浙江省自然科学基金 (No.LY16F020003); 宁波市自然科学基金 (No.2017A610114)

The Overlapping Community Discovery Algorithm Based on Compact Structure

PAN Jian-fei1,2, DONG Yi-hong1, CHEN Hua-hui1, QIAN Jiang-bo1, DAI Ming-yang2   

  1. 1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, Zhejiang 315211, China;
    2. Baidu Online Technology Co. Ltd., Beijing 100084, China
  • Received:2018-01-29 Revised:2018-08-01 Online:2019-01-25 Published:2019-01-25
    • Supported by:
    • National Natural Science Foundation of China (No.61572266, No.61472194); National Natural Science Foundation of Zhejiang Province,  China (No.LY16F020003); Ningbo Natural Science Fund (No.2017A610114)

摘要: 随着网络结构的不断扩大和日益复杂,传统的重叠社区发现算法已经不能有效地处理大规模网络数据,发现合理的社区结构.本文提出了顶点引力的概念,引入顶点凝聚度和社区凝聚度作为满足社区的外部结构稀疏性和社区内部结构紧密性的判定指标,构造了基于结构紧密性的重叠社区发现算法OCSC.该算法经过预处理,核心子图划分以及核心社区的扩展三个步骤,能有效地发现重叠社区,通过对人工合成网络和真实网络结构的社区发现实验,运用NMI和F1Score等指标验证OCSC算法的合理性和优越性.

关键词: 社区发现, 重叠社区, 核心社区, 大规模网络结构, spark

Abstract: With the continuous expansion and complexity of network structure,the traditional overlapping community detection algorithm can not effectively discover reasonable community structure in large-scale network structure.Based on the concept of vertex gravity proposed in this paper,we introduce vertex cohesion and community cohesion as indexes for community structure-close internal structure and sparse external structure,and then put forward overlapping community structure algorithm OCSC.The steps of OCSC algorithm include pre-processing,core sub-mapping and core community expansion.Finally,NMI and F1Score confirm the rationality and superiority of OCSC algorithm by experimentation on synthetic and real network structures.

Key words: community discovery, overlapping community, core community, large-scale network structure, spark

中图分类号: