电子学报 ›› 2020, Vol. 48 ›› Issue (1): 110-117.DOI: 10.3969/j.issn.0372-2112.2020.01.013

• 学术论文 • 上一篇    下一篇

SCBT-index:基于谱编码的子图索引算法

施炜杰, 董一鸿, 钱江波, 陈华辉, 辛宇   

  1. 宁波大学信息科学与工程学院, 浙江宁波 315211
  • 收稿日期:2018-11-06 修回日期:2019-07-29 出版日期:2020-01-25 发布日期:2020-01-25
  • 通讯作者: 董一鸿
  • 作者简介:施炜杰 男,1993年生.CCF学生会员,2019年获宁波大学计算机技术专业硕士学位,主要研究方向为大数据、数据挖掘;陈华辉 男,1964年生于浙江宁波.博士,CCF会员,宁波大学教授,主要研究方向为数据库技术、流数据处理.E-mail:chenhuahui@nbu.edu.cn;钱江波 男,1974年生于浙江宁波.博士,CCF会员,宁波大学教授,主要研究方向为数据库技术、流数据处理、多维数据索引技术等.E-mail:qianjiangbo@nbu.edu.cn
  • 基金资助:
    国家自然科学基金(No.61572266,No.61602133);浙江省自然科学基金(No.LY20F020009,No.LZ20F020001)

SCBT-index: Subgraph Indexing Algorithm Based on Spectral Coding

SHI Wei-jie, DONG Yi-hong, QIAN Jiang-bo, CHEN Hua-hui, XIN Yu   

  1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, Zhejiang 315211, China
  • Received:2018-11-06 Revised:2019-07-29 Online:2020-01-25 Published:2020-01-25

摘要: 随着图模型规模的扩大,单机算法难以适应大规模数据集下的子图查询.而现有的分布式算法基于无索引的简单遍历,join过程容易出现内存溢出,而且查询图分布异常时易出现负载不均衡.提出了一种基于谱编码的二叉索引树(SCBT-index),首先对数据图中的顶点谱编码,根据编码信息构建二叉索引树.然后对查询图使用最小查询计划进行分解,最后join过程使用3个剪枝策略:基于拓扑结构的预剪枝、序列化join和基于分布式下的join优化.实验结果表明,SCBT-index在图集下的综合性能优于现有主流算法,单图下的查询时间为现有算法的1/2到1/4.

关键词: 谱编码, Gini系数, 子图查询, 子图索引

Abstract: With the expansion of graph scale,single-machine algorithms can hardly adapt to the sub-graph queries in large-scale data sets.As existing distributed algorithms are based on simple traversal without index,the join process is prone to memory overflow in the distributed algorithms and load imbalance occurs when the query graph distribution is abnormal.Therefore,a binary index tree based on spectral coding named SCBT-index is proposed.Firstly,for vertex spectrum coding in the data graph,a binary index tree is constructed according to the coding information.Then,the query graph is decomposed using the minimum query plan.Finally,three pruning strategies are used in the join process:structure matching based on topological structure,serialized join and the distributed join optimization.The experimental results show the comprehensive performance of SCBT-index under the graph set is better than that of the popular algorithms.In addition,the query time under the single graph is 1/2 to 1/4 of that of the existing algorithms.

Key words: spectral coding, Gini, subgraph query, subgraph index

中图分类号: