1.计算软件新技术国家重点实验室(南京大学),江苏南京 210023
2.南京大学计算机科学与技术系,江苏南京 210023
3.南京航空航天大学计算机科学与技术学院,江苏南京 211106
[ "顾荣 男,1988年出生,江苏泰州人.博士,南京大学特聘研究员.主要研究方向为大数据与云计算系统,分布式缓存与高效索引系统.E-mail: gurong@nju.edu.cn" ]
[ "罗义力 男,1998年出生.南京大学计算机科学与技术系硕士研究生.主要研究方向为分布式存储系统.E-mail: luoyl@smail.nju.edu.cn" ]
[ "仇伶玮 男,1994年出生.南京大学计算机科学与技术系硕士研究生.主要研究方向为深度学习系统.E-mail: mp1933003@smail.nju.edu.cn" ]
[ "王肇康(通讯作者) 男,1990年出生,河南郑州人.博士、讲师、硕士生导师. 主要研究方向为分布式图计算、分布式数据处理、云计算技术." ]
[ "戴海鹏 男,1985年出生,湖南娄底人.博士、副教授、博士生导师. 主要研究方向为高效数据索引、物联网.E-mail: haipengdai@nju.edu.cn" ]
[ "黄宜华 男,1962年出生,江苏泰州人.博士、教授、博士生导师. 主要研究方向为大数据系统、自动化机器学习、文本分析处理技术.E-mail: yhuang@nju.edu.cn" ]
收稿:2022-04-08,
修回:2022-08-16,
纸质出版:2023-06-25
移动端阅览
顾荣,罗义力,仇伶玮等.跨语言用户态文件系统框架读写性能优化[J].电子学报,2023,51(06):1590-1606.
GU Rong,LUO Yi-li,QIU Ling-wei,et al.Reading and Writing Performance Optimization of Cross-Language FUSE Framework[J].ACTA ELECTRONICA SINICA,2023,51(06):1590-1606.
顾荣,罗义力,仇伶玮等.跨语言用户态文件系统框架读写性能优化[J].电子学报,2023,51(06):1590-1606. DOI: 10.12263/DZXB.20220372.
GU Rong,LUO Yi-li,QIU Ling-wei,et al.Reading and Writing Performance Optimization of Cross-Language FUSE Framework[J].ACTA ELECTRONICA SINICA,2023,51(06):1590-1606. DOI: 10.12263/DZXB.20220372.
以深度学习为代表的数据分析应用越来越多依赖分布式文件系统存储管理大规模数据集. 为了增强数据访问的兼容性,现有分布式文件存储系统通常需提供标准POSIX接口,以支持深度学习等应用的无缝对接. 然而,以内核模块形态开发提供POSIX接口的文件系统非常复杂耗时. 近年来,用户态文件系统(Filesystem in Userspace,FUSE)框架大幅简化了文件系统的开发工作,已被Alluxio和Ceph等诸多知名分布式文件系统使用. 目前常用的用户态FUSE库libfuse仅提供C语言编程接口,但现有大数据分布式文件系统基本都是基于Java语言开发的(例如HDFS和Alluxio等),为了使基于Java语言开发的分布式文件系统可以对接C语言开发的FUSE库,需采用跨语言FUSE框架作为中介. 跨语言FUSE框架利用跨编程语言的函数回调机制,使操作系统FUSE库的C语言函数可以跨语言的调用分布式文件系统提供的Java语言编程接口,从而为大数据分布式文件系统提供标准POSIX接口的访问能力. 但在数据密集型应用中,现有跨语言FUSE框架的执行效率低,导致数据密集型作业(深度学习、大数据分析等)中数据I/O耗时占据了显著的性能开销,成为新的潜在性能瓶颈. 针对此问题,本文首先评估分析了重要且广为使用的跨语言FUSE框架JNR-FUSE的性能,发现并定位其在高并发和小文件场景下存在的性能瓶颈;接着从多方面剖析性能瓶颈根因,进而总结出高效跨语言FUSE框架的性能优化方向,并面向Java语言设计实现了跨语言FUSE框架JNI-FUSE. JNI-FUSE利用延迟分离和元信息缓存等优化技术降低跨语言函数回调开销,从而提升跨语言FUSE框架的性能. 实验结果表明,对比当前性能最好的Java FUSE框架JNR-FUSE,本文提出的JNI-FUSE带来了1.15~6.04倍的FUSE框架性能提升和1.90~2.71倍的文件系统端到端性能提升,并为上层深度学习训练任务带来了1.06~1.73倍的训练加速. 本文设计提出的JNI-FUSE(Java Native Interface-Filesystem in User SpacE)因性能优势,已被知名开源分布式文件系统Alluxio官方接受集成.
Big data analytical applications like deep learning-based AI applications more and more rely on distributed file system to store and manage large scale data sets. File systems often need to provide standard POSIX interfaces to enhance their access compatibility with upper-layer applications. However
it is complicated to develop POSIX-compatible file systems in kernel space. In recent years
FUSE (File System in User Space) has been used by many well-known file systems
including Alluxio
Ceph
etc.
because it significantly simplifies the file system development. The popular FUSE library libfuse is developed in the C language. However
the popular distributed file systems (like HDFS and Alluxio) for big data applications are developed with the Java language. To make the Java-based distributed file systems use the FUSE mechanism
the cross-language FUSE frameworks are needed to bridge the gap
which becomes a potential performance bottleneck. The cross-language FUSE framework uses a cross-programming language function callback mechanism to enable the C functions of the FUSE library to call the programming interface provided by the distributed file system in Java. In this way
we can provide the access to the standard POSIX interface for the Java-based distributed file systems. However
the existing cross-language FUSE frameworks are inefficient in performance. It makes the data I/O in data-intensive applications (like deep learning and big data analysis) occupy noticeable proportion of their execution costs. To address the problem
we first systematically evaluate the performance of the widely-used cross-language FUSE framework
and find the bottlenecks of throughput performance in high concurrency and small file scenarios. We then analyze the bottlenecks of the cross-language FUSE framework from multiple perspectives
and propose several directions for optimizing the cross-language FUSE framework. According to the optimization directions
we design and implement JNI-FUSE (Java Native Interface-Filesystem in User SpacE)
an efficient cross-language FUSE framework. In JNI-FUSE
we propose the defer detach and meta cache techniques to reduce execution costs of cross-programming language function callbacks. Experiment results show that JNI-FUSE improves the average framework performance from 1.15 times to 6.04 times compared to the cutting-edge cross-language FUSE framework JNR-FUSE. JNI-FUSE improves the end-to-end performance by 1.90 time to 2.71 times
and accelerates the deep learning training by 1.06 times to 1.73 times compared to JNR-FUSE. JNI-FUSE has been accepted and integrated by the well-known open-source distributed file system Alluxio due to its good performance.
LI H , GHODOSI A , ZAHARIA M , et al . Tachyon: reliable, memory speed storage for cluster computing frameworks [C]// Proceedings of the 2014 ACM Symposium on Cloud Computing (SoCC) . New York : ACM , 2014 : 6.1- 6 . 15 .
孙勇 . 林菲. 等. 面向云计算的键值型分布式存储系统研究[J]. 电子学报 , 2013 , 41 ( 7 ): 1406 - 1411 .
SUN Y , LIN F , WANG B J . Study on the key-value distributed storage system for cloud computing [J]. Acta Electronica Sinica , 2013 , 41 ( 7 ): 1406 - 1411 . (in Chinese)
GNU . POSIX (The Portable Operating System Interface) [EB/OL]. ( 2016-04-11 )[ 2021-09-28 ]. https://www.gnu.org/software/libc/manual/html_node/POSIX.html https://www.gnu.org/software/libc/manual/html_node/POSIX.html .
Linux Kernel Development Community . FUSE - the linux kernel documentation [EB/OL]. ( 2021-08-23 )[ 2021-09-28 ]. https://www.kernel.org/doc/html/latest/filesystems/fuse.html https://www.kernel.org/doc/html/latest/filesystems/fuse.html .
ALEXANDROV A D , IBEL M , SCHAUSER K E , et al . Extending the operating system at the user level: the UFO global file system [C]// Proceedings of the 1997 USENIX Annual Technical Conference (USENIX ATC) . California : USENIX , 1997 : 77 - 90 .
MAZIERES D . A toolkit for user-level file systems [C]// Proceedings of the 2001 USENIX Annual Technical Conference (USENIX ATC) . California : USENIX , 2001 : 261 - 274 .
WEIL S A , BRANDT S A , MILLER E L , et al . Ceph: a scalable, high-performance distributed file system [C]// Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI) . California : USENIX , 2006 : 307 - 320 .
Red Hat , Inc . Gluster [EB/OL]. ( 2021-09-28 )[ 2022-04-08 ]. https://www.gluster.org/ https://www.gluster.org/ .
VANGOOR B K R , TARASOV V , ZADOK E . To FUSE or not to FUSE: performance of user-space file systems [C]// Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST) . California : USENIX , 2017 : 59 - 72 .
BIJLANI A , RAMACHANDRAN U . Extension framework for file systems in user space [C]// Proceedings of the 2019 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC) . California : USENIX , 2019 : 121 - 134 .
ISHIGURO S , MURAKAMI J , OYAMA Y , et al . Optimizing local file accesses for fuse-based distributed storage [C]// Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (SC) . New York : IEEE , 2012 : 760 - 765 .
NARAYAN S , MEHTA R K , CHANDY J A . User space storage system stack modules with file level control [C]// Proceedings of the 12th Annual Linux Symposium in Ottawa . Ottawa : OLS , 2010 : 189 - 196 .
TSELOVALNIKOV S . JNR-FUSE [EB/OL]. ( 2021-04-30 )[ 2021-09-28 ]. https://github.com/SerCeMan/jnr-fuse https://github.com/SerCeMan/jnr-fuse .
LEVART P . FUSE-J: java bindings for FUSE [EB/OL]. ( 2015-08-06 )[ 2021-09-28 ]. https://sourceforge.net/projects/fuse-j/ https://sourceforge.net/projects/fuse-j/ .
libfuse: The reference implementation of the linux FUSE interface [EB/OL]. ( 2021-09-06 )[ 2021-09-28 ]. https://github.com/libfuse/libfuse/ https://github.com/libfuse/libfuse/ .
Oracle , Inc . Java native interface [EB/OL]. ( 2021-09-28 )[ 2022-04-08 ]. https://docs.oracle.com/javase/8/docs/technotes/guides/jni/ https://docs.oracle.com/javase/8/docs/technotes/guides/jni/ .
DOUBROVKINE D , BLÄSING M . Java native access (JNA) [EB/OL]. ( 2021-08-23 )[ 2021-09-28 ]. https://github.com/java-native-access/jna https://github.com/java-native-access/jna .
SIEGER N , NUTTER C O , JENVEY P , et al . The Java native runtime project [EB/OL]. ( 2021-03-09 )[ 2021-09-28 ] https://github.com/jnr https://github.com/jnr .
WANG L , YE S , YANG B , et al . DIESEL: A dataset-based distributed storage and caching system for large-scale deep learning training [C]// Proceedings of the 49th International Conference on Parallel Processing (ICPP) . New York : ACM , 2020 : 20.1- 20 . 11 .
Oracle , Inc . The invocation API of JNI [EB/OL]. ( 2021-09-26 )[ 2021-09-28 ]. https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html .
Oracle , Inc . JNI copy array [EB/OL]. ( 2021-09-06 )[ 2021-09-28 ]. https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#Get_PrimitiveType_ArrayRegion_routines https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#Get_PrimitiveType_ArrayRegion_routines .
Oracle , Inc . DirectByteBuffer of Java [EB/OL]. ( 2021-09-21 )[ 2021-09-28 ]. https://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html https://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html .
Oracle , Inc . JNI critical copy array [EB/OL]. ( 2018-02-21 )[ 2021-09-28 ]. https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical_ReleasePrimitiveArrayCritical https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical_ReleasePrimitiveArrayCritical .
Kubernetes: Production-grade container orchestration [EB/OL]. ( 2020-09-06 )[ 2021-09-28 ]. https://kubernetes.io/ https://kubernetes.io/ .
Kubernetes container storage interface (CSI) documentation [EB/OL]. ( 2018-03-03 )[ 2021-09-28 ]. https://kubernetes-csi.github.io/docs/introduction.html https://kubernetes-csi.github.io/docs/introduction.html .
Alluxio , Inc . FUSE-based POSIX API [EB/OL]. ( 2021-03-09 )[ 2021-09-28 ]. https://docs.alluxio.io/os/user/stable/en/api/POSIX-API.html https://docs.alluxio.io/os/user/stable/en/api/POSIX-API.html .
NETZER Y , WANG T , COATES A , et al . Reading digits in natural images with unsupervised feature learning [EB/OL]. ( 2020-09-20 )[ 2021-09-28 ]. http://ufldl.stanford.edu/housenumbers http://ufldl.stanford.edu/housenumbers .
COCO: Common objects in context [EB/OL]. ( 2020-09-26 )[ 2021-09-28 ]. https://cocodataset.org/ https://cocodataset.org/ .
PyTorch [EB/OL]. ( 2020-09-02 ) [ 2021-09-28 ]. https://pytorch.org/ https://pytorch.org/ .
PyTorch . ImageNet training in PyTorch [EB/OL]. ( 2020-09-02 )[ 2021-09-28 ]. https://github.com/pytorch/examples/tree/master/imagenet https://github.com/pytorch/examples/tree/master/imagenet .
DENG J , DONG W , SOCHER R , et al . ImageNet: a large-scale hierarchical image database [C]// Proceedings of 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . New York : IEEE , 2009 : 248 - 255 .
SERGEEV A , BALSO M D . Horovod: fast and easy distributed deep learning in TensorFlow [J/OL]. ( 2018-02-01 )[ 2021-09-28 ]. https://arxiv.org/abs/1802.05799 https://arxiv.org/abs/1802.05799 .
Tensorflow Horovod 2 benchmark [EB/OL]. ( 2020-09-26 )[ 2021-09-28 ]. https://github.com/horovod/horovod/blob/master/examples/tensorflow2/tensorflow2_synthetic_ benchmark.py https://github.com/horovod/horovod/blob/master/examples/tensorflow2/tensorflow2_synthetic_benchmark.py .
KRIZHEVSKY A . The CIFAR-10 dataset [EB/OL]. ( 2021-09-28 )[ 2021-09-28 ]. http://www.cs.toronto.edu/kriz/cifar.html http://www.cs.toronto.edu/kriz/cifar.html .
0
浏览量
26
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621