电子学报 ›› 2015, Vol. 43 ›› Issue (9): 1745-1749.DOI: 10.3969/j.issn.0372-2112.2015.09.010

• 学术论文 • 上一篇    下一篇

针对RDF概率图查询的基数估计方法

章登义, 吴文李, 欧阳黜霏   

  1. 武汉大学计算机学院, 湖北武汉 430072
  • 收稿日期:2014-05-23 修回日期:2014-10-08 出版日期:2015-09-25 发布日期:2015-09-25
  • 通讯作者: 吴文李
  • 作者简介:章登义 男,1965年出生于湖北省荆州市.现为武汉大学计算机学院教授、博士生导师.E-mail:dyzhangwhu@163.com

Cardinality Estimation in Query for Probability RDF Graphs

ZHANG Deng-yi, WU Wen-li, OUYANG Chu-fei   

  1. School of Computer, Wuhan University, Wuhan, Hubei 430072, China
  • Received:2014-05-23 Revised:2014-10-08 Online:2015-09-25 Published:2015-09-25

摘要:

资源描述框架图查询中,准确估计查询结果的大小是查询优化器中的关键步骤.已有方法忽略了该图自身的不确定性以及子查询间的关联关系,无法有效估计结果.针对该问题,本文提出一种基于贝叶斯模型的基数估计方法.该方法引入贝叶斯网络模型,挖掘出子查询内的属性依赖.同时,在这些属性依赖的基础上提出子网拼接方法,计算出子查询间的影响因子.最后,利用以上信息准确估计出任意查询结果集的基数.实验表明:与已有方法相比,本文方法的准确性提高15%以上,性能没有大幅度下降.

关键词: 不确定资源描述框架图, 查询处理, 选择基数估计, 查询优化

Abstract:

In RDF(Resource Description Framework) graph query,accurately estimating the size of the query result is a crucial step to the query optimizer.The previous work,which ignores both the uncertainty of RDF graph itself and the correlations between subqueries,is difficult to obtain accurate estimations.To solve this problem,this paper proposes an estimation method based on Bayesian probability model.Our method introduces Bayesian network model for subqueries to dig out the dependencies between properties in subqueries.At the meanwhile,based on these dependencies we propose a connection approach of subnets to compute the impact factors between subqueries.Finally,we exploit the above information to accurately estimate the cardinality of the result about an arbitrary query.The experiments indicate that the accuracy of our estimation results is improved by over 15% and that the query run-time is not increased significantly in comparison with the previous art.

Key words: uncertain RDF graph, query processing, selectivity estimation, query optimization

中图分类号: