西北工业大学计算机学院,陕西西安 710072
[ "李浩然 男,2000年6月出生,云南省昭通人.西北工业大学计算机学院硕士研究生.主要研究方向为纠删码和分布式存储. E-mail: 1148436381@mail.nwpu.edu.cn" ]
[ "黄志杰 男,1983年11月出生,福建省泉州人.西北工业大学计算机学院副教授,硕士生导师.主要研究方向为编码理论、存储系统、区块链等. E-mail: jayzy.huang@nwpu.edu.cn" ]
[ "史宇龙 男,2000年10月出生,陕西省咸阳人.西北工业大学计算机学院硕士研究生.主要研究方向为纠删码、区块链存储. E-mail: yulongshi@mail.nwpu.edu.cn" ]
[ "赵承佳 男,2001年4月出生,贵州省遵义人.西北工业大学计算机学院硕士研究生.主要研究方向为纠删码技术和分布式存储系统. E-mail: 1034886053@qq.com" ]
[ "赵楠楠 女,1987年10月出生,山东省肥城人.西北工业大学计算机学院副教授,硕士生导师.研究方向为云存储、分布式系统、容器虚拟化等. E-mail: nannanzhao@nwpu.edu.cn" ]
[ "张 晓 男,1978年2月出生,河南省新乡人.西北工业大学计算机学院教授,博士生导师.研究方向为分布式存储系统、云计算与云存储系统、系统评测与仿真等. E-mail: zhangxiao@nwpu.edu.cn" ]
收稿:2024-07-01,
修回:2024-10-27,
纸质出版:2025-02-25
移动端阅览
李浩然, 黄志杰, 史宇龙, 等. 分布式存储系统中支持近数据处理的纠删码技术[J]. 电子学报, 2025, 53(02): 344-353.
LI Hao-ran, HUANG Zhi-jie, SHI Yu-long, et al. An Erasure Coding Technology Supporting Near-Data Processing in Distributed Storage Systems[J]. Acta Electronica Sinica, 2025, 53(02): 344-353.
李浩然, 黄志杰, 史宇龙, 等. 分布式存储系统中支持近数据处理的纠删码技术[J]. 电子学报, 2025, 53(02): 344-353. DOI:10.12263/DZXB.20240611
LI Hao-ran, HUANG Zhi-jie, SHI Yu-long, et al. An Erasure Coding Technology Supporting Near-Data Processing in Distributed Storage Systems[J]. Acta Electronica Sinica, 2025, 53(02): 344-353. DOI:10.12263/DZXB.20240611
纠删码技术和近数据处理技术是构建高效的云边端协同数据管理系统的两大基石,前者通过对数据添加编码冗余方式来保障系统的可用性,而后者则通过在存储端处理数据的方式避免大量的网络传输开销.云边端协同的数据管理系统通常采用成熟的分布式存储系统作为底层存储引擎,然而主流的分布式存储系统中的纠删码实现方式并不能高效地支持近数据处理.本文提出了一种支持近数据处理的纠删码技术架构,其基本原理是通过对待编码的一组数据进行重新布局,保证语义相关数据被存储在同一个存储设备上,避免执行近数据处理时的跨节点数据传输.该方案在分布式存储系统Ceph上获得实现,并测试典型场景的读写性能.实验结果表明,在近数据处理场景下和常规数据读取场景下,读取对象的性能分别提升59.4%和10%,对象写入性能则与原版保持一致.
Erasure coding and near-data processing are two cornerstones for building efficient cloud-edge-end collaborative data management systems. The former ensures system availability by adding coding redundancy to the original data
while the latter avoids significant network transmission overhead by processing data at the storage end. Cloud-edge-end collaborative data management systems typically adopt mature distributed storage systems as the underlying storage engine. However
the erasure coding implementation in mainstream distributed storage systems can not efficiently support near-data processing. This paper proposes an erasure coding architecture that supports near-data processing. Its basic principle is to reorganize the data to be encoded to ensure that semantically related data is stored in the same storage device
thereby avoiding cross-node data transmission during near-data processing. The scheme has been implemented on the distributed storage system Ceph
and the read and write performance under typical scenarios are tested. The experimental results show that the performance of reading objects in the near-data processing scenario and the conventional data reading scenario are improved by about 59.4% and 10% respectively
while the object writing performance remains consistent with the original version.
LIU K Y , PENG J , WANG J R , et al . Adaptive and scalable caching with erasure codes in distributed cloud-edge storage systems [J ] . IEEE Transactions on Cloud Computing , 2023 , 11 ( 2 ): 1840 - 1853 .
JIN H , LUO R K , HE Q , et al . Cost-effective data placement in edge storage systems with erasure code [J ] . IEEE Transactions on Services Computing , 2023 , 16 ( 2 ): 1039 - 1050 .
YANG J , SABINS A , S D Bergeret al . C2DN: How to harness erasure codes at the edge for efficient content delivery [C ] // USENIX Symposium on Networked Systems Design and Implementation . Washington : USENIX Association , 2022 : 1159 - 1177 .
GAO M Y , AYERS G , KOZYRAKIS C . Practical near-data processing for In-memory analytics frameworks [C ] // 2015 International Conference on Parallel Architecture and Compilation (PACT) . Piscataway : IEEE , 2015 : 113 - 124 .
WEIL S A , BRANDT S A , MILLER E L , et al . Ceph: A scalable, high-performance distributed file system [C ] // Operating Systems Design and Implementation . Seattle : USENIX Association , 2006 : 307 - 320 .
LIANG Z , LOMBARDI J , CHAARAWI M , et al . DAOS: A scale-out high performance storage stack for storage class memory [M ] // Lecture Notes in Computer Science . Cham : Springer International Publishing , 2020 : 40 - 54 .
ADAMS I F , AGRAWAL N , MESNIER M P . Enabling near-data processing in distributed object storage systems [C ] // Proceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems . New York : ACM , 2021 : 28 - 34 .
REED I S , SOLOMON G . Polynomial codes over certain finite fields [J ] . Journal of the Society for Industrial and Applied Mathematics , 1960 , 8 ( 2 ): 300 - 304 .
PLANK J S . A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems [J ] . Software: Practice and Experience , 1997 , 27 ( 9 ): 995 - 1012 .
BLÖMER J , KALFANE M , KARP R M , et al . An XOR-Based Erasure-Resilient Coding Scheme [R ] . Berkeley : International Computer Science Institute , 1995 .
HUANG C , SIMITCI H , XU Y K , et al . Erasure coding in windows Azure storage [C ] // 2012 USENIX Annual Technical Conference . Boston : USENIX Association , 2012 : 15 - 26 .
CORBETT P , ENGLISH B , GOEL A , et al . Row-diagonal parity for double disk failure correction [C ] // 3rd USENIX Conference on File and Storage . Berkeley : USENIX Association , 2004 : 1 - 14 .
PLANK J S , SIMMERMAN S , SCHUMAN C D . Jerasure: A Library in C/C++ Facilitating Erasure Coding for Storage Applications [M ] . Knoxville : University of Tennessee , 2008 .
PAMIES-JUAREZ L , BLAGOJEVIC F , MATEESCU R , et al . Opening the chrysalis: On the real repair performance of MSR codes [C ] // 14th USENIX Conference on File and Storage Technologies . Santa Clara : USENIX Association , 2016 : 81 - 94 。
VAJHA M , RAMKUMAR V PURANIK B , et al . Clay codes: Moulding MDS codes to yield an MSR code [C ] // 16th USENIX Conference on File and Storage Technologies . Oakland : USENIX Association , 2018 : 139 - 154 .
KOLOSOV O , YADGAR G , LIRAM M , et al . On fault tolerance, locality, and optimality in locally repairable codes [J ] . ACM Transactions on Storage , 2020 , 16 ( 2 ): 1 - 32 .
CHEN J M , LI Z P , FANG G , et al . A comprehensive repair scheme for distributed storage systems [J ] . Computer Networks , 2023 , 235 : 109954 .
AGGARWAL V , CHEN Y F R , LAN T , et al . Sprout: A functional caching approach to minimize service latency in erasure-coded storage [C ] // 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS) . Piscataway : IEEE , 2016 : 753 - 754 .
RUAN Z Y , HE T , CONG J . INSIDER: Designing in-storage computing system for emerging high-performance drive [C ] // USENIX Annual Technical Conference . Washington : USENIX Association , 2019 : 379 - 394 .
KANNAN S , ARPACI-DUSSEAU A C , ARPACI-DUSSEAU R H , et al . Designing a true direct-access file system with DevFS [C ] // 16th USENIX Conference on File and Storage Technologies . Oakland : USENIX Association , 2018 : 241 - 256 .
RACHURI S P , GANTASALA A , EMANUEL P , et al . Optimizing near-data processing for spark [C ] // 2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS) . Piscataway : IEEE , 2022 : 636 - 646 .
SHVACHKO K , KUANG H R , RADIA S , et al . The hadoop distributed file system [C ] // 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) . Piscataway : IEEE , 2010 : 1 - 10 .
CHAKRABORTY J , JIMENEZ I , RODRIGUEZ S A , et al . Skyhook: towards an arrow-native storage system [C ] // 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid) . Piscataway : IEEE , 2022 : 81 - 88 .
SEVILLA M A , WATKINS N , JIMENEZ I , et al . Malacology: A programmable storage system [C ] // Proceedings of the Twelfth European Conference on Computer Systems . New York : ACM , 2017 : 175 - 190 .
SENEVIRATNE Y , SEEMAKHUPT K , LIU S H , et al . NearPM: A near-data processing system for storage-class applications [C ] // Proceedings of the Eighteenth . New York : ACM , 2023 : 751 - 767 .
JOHANES J , JOHARI M F , KHALID M , et al . Comparison of various virtual machine disk images performance on glusterFS and ceph rados block devices [C ] // 3th International Conference on Informatics & Applications . Malaysia : SDIWC , 2014 : 1 - 7 .
0
浏览量
37
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621