HDVM:Compression & Query Model of Linked-Data Based on Relational Matrix

FU Hai-dong, PENG Shen, HUANG Li, GU Jin-guang

ACTA ELECTRONICA SINICA ›› 2018, Vol. 46 ›› Issue (3) : 721-729.

PDF(2081 KB)
CIE Homepage  |  Join CIE  |  Login CIE  |  中文 
PDF(2081 KB)
ACTA ELECTRONICA SINICA ›› 2018, Vol. 46 ›› Issue (3) : 721-729. DOI: 10.3969/j.issn.0372-2112.2018.03.030

HDVM:Compression & Query Model of Linked-Data Based on Relational Matrix

  • FU Hai-dong1,2, PENG Shen1,2,3, HUANG Li1,2, GU Jin-guang1,2,3,4
Author information +

Abstract

With the arrival of big data era, a large number of RDF (Resource Description Framework) data is flooding the entire Web of Data. Since the indexes of these datasets cannot be fully loaded in main memory when the RDF engines manage these huge datasets, these systems need to perform slow disk accesses to solve SPARQL queries. In this paper, a method named HDVM is proposed to reduce the number of linked data repeated times by extracting the latent triplet relation matrix from the linked dataset, and storing them in the form of subject vector, predicate vector and object matrix, which allows SPARQL queries to be full-in-memory performed without decompression. The experimental results show that the HDVM (Header Dictionary Vector Matrix) model proposed in this paper can improve the compression rate by 3%~20% compared with HDT (Header-Dictionary Triples), and the query time on billion-level-size dataset reaches average 400 milliseconds.

Key words

relation matrix / linked-data / query / compression

Cite this article

Download Citations
FU Hai-dong, PENG Shen, HUANG Li, GU Jin-guang. HDVM:Compression & Query Model of Linked-Data Based on Relational Matrix[J]. Acta Electronica Sinica, 2018, 46(3): 721-729. https://doi.org/10.3969/j.issn.0372-2112.2018.03.030

References

[1] PAN J Z.Resource Description Framework[M] Berlin Heidelberg:Springer,2009.71-90.
[2] BIZER C,TOM H,TIM B L,et al.Linked data:the story so far[J].International Journal on Semantic Web & Information Systems,2009,5(3):1-22.
[3] WU H,VILLAZON-TERRAZAS B,PAN J Z,et al.How redundant is it?-an empirical analysis on linked datasets[A].Proceedings of the 5th International Conference on Consuming Linked Data-Volume 1264[C].Aachen:CEUR-WS.org,2014.97-108.
[4] NAVARRO G,MÄKINEN V.Compressed full-text indexes[J].ACM Computing Surveys (CSUR),2007,39(1):2.
[5] FERNáNDEZ J D,GUTIERREZ C,MARTíNEZ-PRIETO M A.RDF compression:basic approaches[A].International Conference on World Wide Web[C].New York:ACM,2010.1091-1092.
[6] PAVLOV I.Lzma sdk (software development kit)[OL].http://www.7-zip.org,2013-06-16.
[7] SEWARD J.bzip2 and libbzip2[OL].http://www.bzip.org,2007-12-10.
[8] ATRE M,CHAOJI V,ZAKI M J,et al.Matrix Bit loaded:a scalable lightweight join query processor for RDF data[A].Proceedings of the 19th International Conference on World Wide Web[C].New York:ACM,2010.41-50.
[9] ÁLVAREZGARCÍA S,BRISABOA N R,FERNÁNDEZ J D,et al.Compressed k2-triples for full-in-memory RDF engines[A].Americas Conference on Information Systems[C].Detroit:MACIS,2011.
[10] ÁLVAREZ-GARCÍA S,BRISABOA N,FERNÁNDEZ J D,et al.Compressed vertical partitioning for efficient RDF management[J].Knowledge and Information Systems,2015,44(2):439-474.
[11] FERNÁNDEZ J D,MARTÍNEZ-PRIETO M A,GUTIERREZ C.Compact Representation of Large RDF Data Sets for Publishing and Exchange[M].Heidelberg,Berlin:Springer,2010.193-208.
[12] FERNÁNDEZ J D,MARTÍNEZ-PRIETO M A,GUTIÉRREZ C,et al.Binary RDF representation for publication and exchange (HDT)[J].Web Semantics Science Services & Agents on the World Wide Web,2013,19(1):22-41.
[13] MARTÍNEZ-PRIETO M A,GALLEGO M A,FERNÁNDEZ J D.Exchange and consumption of Huge RDF Data[M].Heidelberg,Berlin:Springer,2012.437-452.
[14] CURÉ O,BLIN G,REVUZ D,et al.Waterfowl:A Compact,Self-Indexed and Inference-Enabled Immutable RDF Store[M].Heidelberg,Berlin:Springer International Publishing,2014.302-316.
[15] HERNÁNDEZ-ILLERA A,MARTINEZ-PRIETO M A,FERNÁNDEZ J D.Serializing RDF in compressed space[A].Data Compression Conference (DCC)[C].New York:IEEE Press,2015.363-372.
[16] URBANI J,MAASSEN J,DROST N,et al.Scalable RDF data compression with MapReduce[J].Concurrency and Computation:Practice and Experience,2013,25(1):24-39.
[17] 李建江,崔健,王聃,等.MapReduce并行编程模型研究综述[J].电子学报,2011,39(11):2635-2642. LI J J,CUI J,WANG D,et al.Survey of mapreduce parallel programming model[J].Acta Electronica Sinica,2011,39(11):2635-2642.(in Chinese)
[18] GIMÉNEZ-GARCÍA J M,FERNÁNDEZ J D,MARTÍNEZ-PRIETO M A.HDT-MR:A Scalable Solution for RDF Compression with HDT and MapReduce[M] Heidelberg,Berlin:Springer International Publishing,2015:253-268.
[19] LIU J,WEI L I,LUO L,et al.Linked open data query based on Natural Language[J].Chinese Journal of Electronics,2017,26(2):230-235.
[20] 章登义,吴文李,欧阳黜霏.基于语义度量的RDF图近似查询[J].电子学报,2015,43(7):1320-1328. ZHANG D Y,WU W L,OUYANG C F.Approximating query with semantic-based measure on RDF graphs[J].Acta Electronica Sinica,2015,43(7):1320-1328.(in Chinese)

Funding

National Natural Science Foundation of China (No.61673304, No.61272110); Major Project of The National Social Science Fund of China (No.11&ZD189); Open Fund of State Key Laboratory of Software Engineering,  Wuhan University,  China (No.SKLSE2012-09-07)
PDF(2081 KB)

Accesses

Citation

Detail

Sections
Recommended

/