

浏览全部资源
扫码关注微信
宁波大学信息科学与工程学院, 浙江宁波 315211
Received:27 August 2020,
Revised:2021-04-14,
Published:25 November 2021
移动端阅览
俞加平,陈华辉,钱江波等.LSM树中基于热度预测的异构布隆过滤器方案[J].电子学报,2021,49(11):2090-2095.
YU Jia-ping,CHEN Hua-hui,QIAN Jiang-bo,et al.A Heterogeneous Bloom Filter Scheme in LSM Tree Based on Hotness Prediction[J].ACTA ELECTRONICA SINICA,2021,49(11):2090-2095.
俞加平,陈华辉,钱江波等.LSM树中基于热度预测的异构布隆过滤器方案[J].电子学报,2021,49(11):2090-2095. DOI: 10.12263/DZXB.20200945.
YU Jia-ping,CHEN Hua-hui,QIAN Jiang-bo,et al.A Heterogeneous Bloom Filter Scheme in LSM Tree Based on Hotness Prediction[J].ACTA ELECTRONICA SINICA,2021,49(11):2090-2095. DOI: 10.12263/DZXB.20200945.
日志结构合并(Log-Structured-Merge,LSM)树中常使用布隆过滤器减少无效磁盘I/O.但是用户无法无限制地细化布隆过滤器的粒度,原因是在一些数据量庞大而数据项较小的工作流中,这些元数据需要占用大量存储空间.其次在一些内存受限的环境下,内存缓冲区无法容纳更多的过滤器数据,造成缓冲区与磁盘的频繁数据交换.针对上述问题本文提出LSM树中的异构布隆过滤器方案,在LSM树的每一层维护热度预测模型,新生成的SSTable通过预测的热度来分配不同粒度的布隆过滤器,然后使用特定缓存管理方案来维护缓存中的过滤器数据并处理工作流热度发生改变的情况.实验证明,本文的方案在保持相同外存占用与内存消耗的情况下,读取吞吐量比采用原始LSM树结构的LevelDB提升22%~53%.
Bloom filters are often used in the LSM(Log-Structured-Merge) tree to reduce unnecessary disk I/O. However
users cannot refine the granularity of Bloom filters infinitely
because in some workflows with huge data volume and small data items
these metadata require a lot of storage space. Secondly
in some memory-constrained environments
the cache cannot accommodate more filter data
resulting in frequent data exchange between memory and disk. In view of the above problems
we propose a heterogeneous Bloom filter scheme in LSM tree. The hotness prediction model is maintained at each layer of the LSM tree
and the newly generated disk components are distributed with different granularity according to the predicted hotness. And a specific cache management scheme is used to maintain the filter data in memory cache and deal with changes in workflow hotness. It is experimentally proven that the proposed scheme improves read throughput by 22%~53% over LevelDB with the original LSM tree structure while maintaining the same storage occupation and memory consumption.
O'Neil P , Cheng E , Gawlick D , et al . The log-structured merge-tree (LSM-tree) [J]. Acta Informatica , 1996 , 33 ( 4 ): 351 - 385 .
Dayan N , Idreos S . The log-structured merge-bush & the wacky continuum [A]. Proceedings of the 2019 International Conference on Management of Data [C]. New York, USA : ACM , 2019 . 449 - 466 .
Luo C , Carey M J . LSM-based storage techniques: a survey [J]. The VLDB Journal , 2020 , 29 ( 1 ): 393 - 418 .
Bloom B H . Space/time trade-offs in Hash coding with allowable errors [J]. Communications of the ACM , 1970 , 13 ( 7 ): 422 - 426 .
Chang F , Dean J , Ghemawat S , et al . Bigtable: a distributed storage system for structured data [J]. ACM Transactions on Computer Systems (TOCS) , 2008 , 26 ( 2 ): 1 - 26 .
Dayan N , Athanassoulis M , Idreos S . Monkey: optimal navigable key-value store [A]. Proceedings of the 2017 ACM International Conference on Management of Data [C]. New York, USA : ACM , 2017 . 79 - 94 .
Dayan N , Athanassoulis M , Idreos S . Optimal bloom filters and adaptive merging for LSM-trees [J]. ACM Transactions on Database Systems (TODS) , 2018 , 43 ( 4 ): 1 - 48 .
Li Y , Tian C , Guo F , et al . ElasticBF: elastic bloom filter with hotness awareness for boosting read performance in large key-value stores [A]. Proceedings of the 2019 USENIX Annual Technical Conference [C]. Berkeley, USA : USENIX Association , 2019 . 739 - 752 .
Wu X , Xu Y , Shao Z , et al . LSM-trie: an LSM-tree-based ultra-large key-value store for small data items [A]. Proceedings of the 2015 USENIX Annual Technical Conference [C]. Berkeley, USA : USENIX Association , 2015 . 71 - 82 .
Cooper B F , Silberstein A , Tam E , et al . Benchmarking cloud serving systems with YCSB [A]. Proceedings of the 1st ACM Symposium on Cloud Computing [C]. New York, USA : ACM , 2010 . 143 - 154 .
Bronson N , Amsden Z , Cabrera G , et al . TAO: Facebook's distributed data store for the social graph [A]. Proceedings of the 2013 USENIX Annual Technical Conference [C]. Berkeley, USA : USENIX Association , 2013 . 49 - 60 .
0
Views
15
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621