宁波大学信息科学与工程学院, 浙江宁波 315211
[ "俞加平 男,1995年生于浙江湖州.现为宁波大学信息科学与工程学院硕士研究生.主要研究方向为非关系型数据库、数据挖掘等. E-mail:1811082061@nbu.edu.cn" ]
[ "陈华辉(通信作者) 男,1964年生于浙江宁波.现为宁波大学教授、博士生导师.主要研究方向为数据流处理及大数据处理、数据挖掘等. E-mail:chenhuahui@nbu.edu.cn" ]
[ "钱江波 男,1974年生于浙江宁波. 现为宁波大学教授、博士生导师.主要研究方向为数据库管理、多维度索引及查询优化等. E-mail:qianjb@163.com" ]
[ "董一鸿 男,1969年生于浙江宁波. 现为宁波大学教授、硕士生导师.主要研究方向为大数据处理、数据挖掘及人工智能. E-mail:dongyihong@nbu.edu.cn" ]
收稿:2020-08-27,
修回:2021-04-14,
纸质出版:2021-11-25
移动端阅览
俞加平,陈华辉,钱江波等.LSM树中基于热度预测的异构布隆过滤器方案[J].电子学报,2021,49(11):2090-2095.
YU Jia-ping,CHEN Hua-hui,QIAN Jiang-bo,et al.A Heterogeneous Bloom Filter Scheme in LSM Tree Based on Hotness Prediction[J].ACTA ELECTRONICA SINICA,2021,49(11):2090-2095.
俞加平,陈华辉,钱江波等.LSM树中基于热度预测的异构布隆过滤器方案[J].电子学报,2021,49(11):2090-2095. DOI: 10.12263/DZXB.20200945.
YU Jia-ping,CHEN Hua-hui,QIAN Jiang-bo,et al.A Heterogeneous Bloom Filter Scheme in LSM Tree Based on Hotness Prediction[J].ACTA ELECTRONICA SINICA,2021,49(11):2090-2095. DOI: 10.12263/DZXB.20200945.
日志结构合并(Log-Structured-Merge,LSM)树中常使用布隆过滤器减少无效磁盘I/O.但是用户无法无限制地细化布隆过滤器的粒度,原因是在一些数据量庞大而数据项较小的工作流中,这些元数据需要占用大量存储空间.其次在一些内存受限的环境下,内存缓冲区无法容纳更多的过滤器数据,造成缓冲区与磁盘的频繁数据交换.针对上述问题本文提出LSM树中的异构布隆过滤器方案,在LSM树的每一层维护热度预测模型,新生成的SSTable通过预测的热度来分配不同粒度的布隆过滤器,然后使用特定缓存管理方案来维护缓存中的过滤器数据并处理工作流热度发生改变的情况.实验证明,本文的方案在保持相同外存占用与内存消耗的情况下,读取吞吐量比采用原始LSM树结构的LevelDB提升22%~53%.
Bloom filters are often used in the LSM(Log-Structured-Merge) tree to reduce unnecessary disk I/O. However
users cannot refine the granularity of Bloom filters infinitely
because in some workflows with huge data volume and small data items
these metadata require a lot of storage space. Secondly
in some memory-constrained environments
the cache cannot accommodate more filter data
resulting in frequent data exchange between memory and disk. In view of the above problems
we propose a heterogeneous Bloom filter scheme in LSM tree. The hotness prediction model is maintained at each layer of the LSM tree
and the newly generated disk components are distributed with different granularity according to the predicted hotness. And a specific cache management scheme is used to maintain the filter data in memory cache and deal with changes in workflow hotness. It is experimentally proven that the proposed scheme improves read throughput by 22%~53% over LevelDB with the original LSM tree structure while maintaining the same storage occupation and memory consumption.
O'Neil P , Cheng E , Gawlick D , et al . The log-structured merge-tree (LSM-tree) [J]. Acta Informatica , 1996 , 33 ( 4 ): 351 - 385 .
Dayan N , Idreos S . The log-structured merge-bush & the wacky continuum [A]. Proceedings of the 2019 International Conference on Management of Data [C]. New York, USA : ACM , 2019 . 449 - 466 .
Luo C , Carey M J . LSM-based storage techniques: a survey [J]. The VLDB Journal , 2020 , 29 ( 1 ): 393 - 418 .
Bloom B H . Space/time trade-offs in Hash coding with allowable errors [J]. Communications of the ACM , 1970 , 13 ( 7 ): 422 - 426 .
Chang F , Dean J , Ghemawat S , et al . Bigtable: a distributed storage system for structured data [J]. ACM Transactions on Computer Systems (TOCS) , 2008 , 26 ( 2 ): 1 - 26 .
Dayan N , Athanassoulis M , Idreos S . Monkey: optimal navigable key-value store [A]. Proceedings of the 2017 ACM International Conference on Management of Data [C]. New York, USA : ACM , 2017 . 79 - 94 .
Dayan N , Athanassoulis M , Idreos S . Optimal bloom filters and adaptive merging for LSM-trees [J]. ACM Transactions on Database Systems (TODS) , 2018 , 43 ( 4 ): 1 - 48 .
Li Y , Tian C , Guo F , et al . ElasticBF: elastic bloom filter with hotness awareness for boosting read performance in large key-value stores [A]. Proceedings of the 2019 USENIX Annual Technical Conference [C]. Berkeley, USA : USENIX Association , 2019 . 739 - 752 .
Wu X , Xu Y , Shao Z , et al . LSM-trie: an LSM-tree-based ultra-large key-value store for small data items [A]. Proceedings of the 2015 USENIX Annual Technical Conference [C]. Berkeley, USA : USENIX Association , 2015 . 71 - 82 .
Cooper B F , Silberstein A , Tam E , et al . Benchmarking cloud serving systems with YCSB [A]. Proceedings of the 1st ACM Symposium on Cloud Computing [C]. New York, USA : ACM , 2010 . 143 - 154 .
Bronson N , Amsden Z , Cabrera G , et al . TAO: Facebook's distributed data store for the social graph [A]. Proceedings of the 2013 USENIX Annual Technical Conference [C]. Berkeley, USA : USENIX Association , 2013 . 49 - 60 .
0
浏览量
15
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621