

浏览全部资源
扫码关注微信
1.中国科学院计算技术研究所专项技术研究中心,北京100190
2.中国科学院大学,北京100049
Received:30 March 2023,
Revised:2023-10-16,
Published:25 November 2023
移动端阅览
赵二虎,吴济文,肖思莹等.嵌入式异构智能计算系统并行多流水线设计[J].电子学报,2023,51(11):3354-3364.
ZHAO Er-hu,WU Ji-wen,XIAO Si-ying,et al.Parallel Multi Pipeline Design of Embedded Heterogeneous AI Computing Systems[J].ACTA ELECTRONICA SINICA,2023,51(11):3354-3364.
赵二虎,吴济文,肖思莹等.嵌入式异构智能计算系统并行多流水线设计[J].电子学报,2023,51(11):3354-3364. DOI: 10.12263/DZXB.20230281.
ZHAO Er-hu,WU Ji-wen,XIAO Si-ying,et al.Parallel Multi Pipeline Design of Embedded Heterogeneous AI Computing Systems[J].ACTA ELECTRONICA SINICA,2023,51(11):3354-3364. DOI: 10.12263/DZXB.20230281.
嵌入式智能计算系统因其功耗受限和多传感器实时智能处理需要,对硬件平台的智能算力能效比和智能计算业务并行度提出了严峻挑战.传统嵌入式计算系统常采用的DSP+FPGA数字信号处理架构,无法适用于多个神经网络模型加速场景.本文基于ARM+DLP+SRIO嵌入式异构智能计算架构,利用智能处理器多片多核多内存通道特性,提出了并行多流水线设计方法.该方法充分考虑智能计算业务中数据传输、拷贝、推理、结果反馈等环节时间开销,为不同的神经网络模型合理分配智能算力资源,以达到最大的端到端智能计算业务吞吐率.实验结果表明,采用并行多流水线设计方法的深度学习处理器利用率较单流水线平均提高约25.2%,较无流水线平均提高约30.7%,满足可见光、红外、SAR等多模图像实时智能处理需求,具有实际应用价值.
Due to the limited power consumption and the need for real-time intelligent processing of multiple sensors
embedded AI computing systems desire for higher energy efficiency and more parallel intelligent computing services simultaneously.The digital signal processing architecture DSP+FPGA commonly used in traditional embedded computing systems is not suitable for multiple ANN models inference acceleration.Based on embedded heterogeneous intelligent computing architecture ARM+DLP+SRIO
this paper proposes a parallel multi pipeline design method by taking advantage of the characteristics of multi chip
multi-core and multi memory channels of deep learning processors.Considering the time cost of data transmission
copy
reference and feedback
this method allocates intelligent computing resources for different neural network models to achieve the maximum end-to-end throughput.The experimental results show that the utilization of the deep learning processor using the parallel multi pipeline design method is about 25.2% higher than that of a single pipeline
and about 30.7% higher than that without pipeline.It meets the real-time intelligent processing requirements of visible light
infrared and SAR images
and is valuable for practical applications.
陈云霁 , 李玲 , 李威 , 等 . 智能计算系统 [M ] . 北京 : 机械工业出版社 , 2020 .
CHEN Y J , LI L , LI W , et al . AI Computing Systems [M ] . Beijing : China Machine Press , 2020 . (in Chinese)
赵二虎 , 吴济文 , 查晶晶 , 等 . 基于ARM+DLP+SRIO的嵌入式智能计算系统研究 [J ] . 电子学报 , 2021 , 49 ( 3 ): 443 - 453 .
ZHAO E H , WU J W , ZHA J J , et al . Embedded AI computing system based on ARM+DLP+SRIO [J ] . Acta Electronica Sinica , 2021 , 49 ( 3 ): 443 - 453 . (in Chinese)
ISEREAU D , CAPRARO C , COTE E , et al . Utilizing high-performance embedded computing, agile condor, for intelligent processing: An artificial intelligence platform for remotely piloted aircraft [C ] // 2017 Intelligent Systems Conference (IntelliSys) . Piscataway : IEEE , 2017 : 1155 - 1159 .
Barnell M , Raymond C , Capraro C , et al . High-performance computing (HPC) and machine learning demonstrated in flight using agile condor [C ] // 2018 IEEE High Performance Extreme Computing Conference (HPEC) . Piscataway : IEEE , 2018 : 1 - 4 .
段海滨 , 申燕凯 , 赵彦杰 , 等 . 2019年无人机热点回眸 [J ] . 科技导报 , 2020 , 38 ( 1 ): 3 - 5 .
DUAN H B , SHEN Y K , ZHAO Y J , et al . Review of technological hotspots of unmanned aerial vehicle in 2019 [J ] . Science & Technology Review , 2020 , 38 ( 1 ): 3 - 5 . (in Chinese)
陈鹏 , 宋愿赟 , 李文静 , 等 . 临近空间高速侦察与监视载荷技术研究综述 [J ] . 战术导弹技术 , 2021 , ( 1 ): 7 - 12 .
CHEN P , SONG Y Y , LI W J , et al . Review of high speed reconnaissance and surveillance payload technology in near space [J ] . Tactical Missile Technology , 2021 ( 1 ): 7 - 12 . (in Chinese)
KESSLER C W , KELLER J . Optimized mapping of pipelined task graphs on the Cell BE [C ] // Proceedings of the 14th International Workshop on Compilers for Parallel Computers (CPC-2009) . Piscataway : IEEE , 2009 : 1 - 7 .
曹倩 , 胡长军 , 李士刚 . Cell 异构多核处理器上流水并行优化技术 [J ] . 计算机应用研究 , 2011 , 9 ( 28 ): 3344 - 3347 .
CAO Q , HU C J , LI S G . Cell heterogeneous multicore pipeline parallel optimization techniques [J ] . Application Research of Computers , 2011 , 9 ( 28 ): 3344 - 3347 . (in Chinese)
李士刚 , 胡长军 , 王珏 , 等 . 异构多核上多级并行模型支持及性能优化 [J ] . 软件学报 , 2013 , 24 ( 12 ): 2782 - 2796 .
LI S G , HU C J , WANG J , et al . Support for multi-level parallelism on heterogeneous multi-core and performance optimization [J ] . Journal of Software , 2013 , 24 ( 12 ): 2782 - 2796 . (in Chinese)
杨平平 , 岳春生 , 胡泽明 . 异构信号处理平台中层次性流水线调度算法 [J ] . 计算机工程 , 2018 , 44 ( 11 ): 83 - 89 .
YANG P P , YUE C S , HU Z M . Multi-level pipeline scheduling algorithm in heterogeneous signal processing platform [J ] . Computer Engineering , 2018 , 44 ( 11 ): 83 - 89 . (in Chinese)
于俊清 , 张维维 , 陈文斌 , 等 . 面向多核集群的数据流程序层次流水线并行优化方法 [J ] . 计算机学报 , 2014 , 37 ( 10 ): 2071 - 2083 .
YU J Q , ZHANG W W , CHEN W B , et al . Multi-level pipelining parallelism for dataflow programs on multi-core cluster [J ] . Chinese Journal of Computers , 2014 , 37 ( 10 ): 2071 - 2083 . (in Chinese)
OH C , YI S , YI Y . Real-time face detection in full HD images exploiting both embedded CPU and GPU [C ] // 2015 IEEE International Conference on Multimedia and Expo (ICME) . Piscataway : IEEE , 2015 : 1 - 6 .
韩玉艳 , 攻敦为 , 桑红验 , 等 . 基于进化优化的多目标批量流水线调度 [M ] . 北京 : 科学出版社 , 2018 .
HAN Y Y , GONG D W , SANG H Y , et al . Multi-Objective Batch Pipeline Scheduling Based on Evolutionary Optimization [M ] . Beijing : Science Press , 2018 . (in Chinese)
DEFERSHA F M , CHEN M . Mathematical model and parallel genetic algorithm for hybrid flexible flowshop lot streaming problem [J ] . International Journal of Advanced Manufacturing Technology , 2012 , 62 : 249 - 265 .
LI J Q , PAN Q K , LIANG Y C . An effective hybrid tabu search algorithm for multi-objective flexible job-shop scheduling problems [J ] . Computers & Industrial Engineering , 2010 , 59 ( 4 ): 647 - 662 .
PAN Q K , TASGETIREN M F , SUGANTHAN P N , et al . A discrete artificial bee colony algorithm for the lot-streaming flow shop scheduling problem [J ] . Computer Engineering & Applications , 2011 , 181 ( 12 ): 2455 - 2468 .
MARIMUTHU S , PONNAMBALAM S G , JAWAHAR N . Threshold accepting and ant-colony optimization algorithms for scheduling m-machine flow shops with lot streaming [J ] . Journal of Materials Processing Technology , 2009 , 209 ( 2 ): 1026 - 1041 .
TSENG C T , LIAO C J . A discrete particle swarm optimization for lot-streaming flowshop scheduling problem [J ] . European Journal of Operational Research , 2008 , 191 ( 2 ): 360 - 373 .
桑红燕 , 潘全科 , 武磊 , 等 . 批量流水线调度问题的混合差分进化算法 [J ] . 计算机工程与应用 , 2010 , 46 ( 21 ): 47 - 50 .
SANG H Y , PAN Q K , WU L , et al . Effective hybrid differential evolution algorithms for lot-streaming flowshop scheduling problem [J ] . Computer Engineering and Applications , 2010 , 46 ( 21 ): 47 - 50 . (in Chinese)
韩栋 , 周聖元 , 支天 , 等 . 智能芯片的评述和展望 [J ] . 计算机研究与发展 , 2019 , 56 ( 1 ): 7 - 22 .
HAN D , ZHOU S Y , ZHI T , et al . A survey of artificial intelligence chip [J ] . Journal of Computer research and development , 2019 , 56 ( 1 ): 7 - 22 . (in Chinese)
尹首一 , 郭珩 , 魏少军 . 人工智能芯片发展的现状及趋势 [J ] . 科技导报 , 2018 , 36 ( 17 ): 45 - 51 .
YIN S Y , GUO H , WEI S J . Present situation and future trend of artificial intelligence chips [J ] . Science & Technology , 2018 , 36 ( 17 ): 45 - 51 . (in Chinese)
中科寒武纪公司 . MLU100简介 [EB/OL ] . ( 2019-07-11 )[ 2023-03-30 ] . https://forum.cambricon.com/ https://forum.cambricon.com/ .
天津飞腾公司 . FT-2000/4系列处理器数据手册 [EB/OL ] . ( 2022-06-06 )[ 2023-03-30 ] . https://www.phytium.com.cn/class/38?page=2 https://www.phytium.com.cn/class/38?page=2 .
RAPIDIO . RapidIO Specification 2 . 2 [EB/OL ] . ( 2011-05-01 ) [ 2023-03-30 ] . https://rapidio.org/files/RapidIO_Rev_2.2_Specification.zip https://rapidio.org/files/RapidIO_Rev_2.2_Specification.zip .
RAPIDIO . RapidIO Specification 4 . 1 [EB/OL ] . ( 2017-07-01 )[ 2023-03-30 ] . http://rapidio.wpengine.com/wp-content/uploads/2018/06/RapidIO-Specification-4-1.pdf http://rapidio.wpengine.com/wp-content/uploads/2018/06/RapidIO-Specification-4-1.pdf .
中科寒武纪公司 . 寒武纪端云一体人工智能开发平台Cambricon Neuware白皮书 [EB/OL ] . ( 2019-08-13 )[ 2023-03-30 ] . https://www.cambricon.com/ https://www.cambricon.com/ .
中科寒武纪公司 . 寒武纪软件开发环境 [EB/OL ] . ( 2019-07-11 )[ 2023-03-30 ] . https://forum.cambricon.com/ https://forum.cambricon.com/ .
YANN L C . Deep learning hardware: Past, present, and future [C ] // Proceedings of 2019 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2019 : 12 - 19 .
BARNELL M , RAYMOND C , et al . Model quantization and synthetic aperture data analyses increasing throughput and energy efficiency [C ] // 2021 IEEE High Performance Extreme Computing Conference (HPEC) . Piscataway : IEEE , 2021 : 1 - 5 .
JOCHER G . YOLOv 5 release v 6 . 1 [EB/OL ] . ( 2022-08-21 ) [ 2023-03-30 ] . https://github.com/ultralytics/yolov5/releases/tag/v6.1 https://github.com/ultralytics/yolov5/releases/tag/v6.1 .
LI C Y , LI L L , GENG Y F , et al . YOLOv 6 v 3 . 0 : A full-scale reloading[EB/OL ] . ( 2023-01-23 ) [ 2023-03-30 ] . https://arxiv.org/abs/2301.05586 https://arxiv.org/abs/2301.05586 .
WANG C Y , BOCHKOVSKIY A , LIAO H Y M . Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [EB/OL ] .( 2022-07-06 )[ 2023-03-30 ] . https://doi.org/10.48550/arXiv.2207.02696 https://doi.org/10.48550/arXiv.2207.02696 .
JOCHER G . Ultralytics YOLOv8 [EB/OL ] .( 2022-04-18 ) [ 2023-03-30 ] . https://github.com/ultralytics/ultralytics https://github.com/ultralytics/ultralytics .
GE Z , LIU S T , WANG F , et al . Yolox: Exceeding yolo series in 2021 [EB/OL ] . ( 2021-07-18 )[ 2023-03-30 ] . https://doi.org/10.48550/arXiv.2107.08430 https://doi.org/10.48550/arXiv.2107.08430 .
0
Views
12
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621