电子学报 ›› 2021, Vol. 49 ›› Issue (3): 443-453.DOI: 10.12263/DZXB.20200415

• 学术论文 • 上一篇    下一篇

基于ARM+DLP+SRIO的嵌入式智能计算系统研究

赵二虎, 吴济文, 查晶晶, 郭振, 徐勇军   

  1. 中国科学院计算技术研究所专项技术研究中心, 北京 100190
  • 收稿日期:2020-05-06 修回日期:2020-08-03 出版日期:2021-03-25
    • 作者简介:
    • 赵二虎 男,1985年生于河北邢台.现为中国科学院计算技术研究所高级工程师,专项技术研究中心智算平台研究组组长,在读博士研究生,主要研究方向为嵌入式智能计算系统.E-mail:zhaoerhu@ict.ac.cn;吴济文 男,1987年生于江西上饶.现为中国科学院计算技术研究所工程师,主要研究方向为嵌入式智能计算系统的软硬件协同优化.E-mail:wujiwen@ict.ac.cn;查晶晶 女,1994年生于河南周口.现为中国科学院计算技术研究所工程师,主要研究方向为边缘异构智能计算系统中的算法优化与加速.E-mail:zhajingjing@ict.ac.cn;郭振 男,1992年生于陕西商洛.现为中国科学院计算技术研究所工程师,主要研究方向为嵌入式智能计算系统的算法移植与优化.E-mail:guozhen@ict.ac.cn;徐勇军 男,1979年生于安徽安庆.现为中国科学院计算技术研究所正研级高级工程师、博士生导师、专项技术研究中心主任,主要研究方向为数据智能.E-mail:xyj@ict.ac.cn
    • 基金资助:
    • "十三五"领域基金 (No.61403120111)

Embedded AI Computing System Based on ARM+DLP+SRIO

ZHAO Er-hu, WU Ji-wen, ZHA Jing-jing, GUO Zhen, XU Yong-jun   

  1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2020-05-06 Revised:2020-08-03 Online:2021-03-25 Published:2021-03-25
    • Supported by:
    • 13th Five-year Plan Field Fund (No.61403120111)

摘要: 以x86+GPU为代表的当前主流AI计算平台,受限于功耗、体积、带宽、环境适应性等因素,无法适用于物端及边缘智能计算场景.提出并研究了一种基于ARM+DLP+SRIO的嵌入式智能计算系统,从AI算力、能效比、IO带宽三个方面分析了所提嵌入式智能计算系统的设计思路和技术优势,并实验验证了该系统的功能及性能指标.实验结果表明:基于ARM+DLP+SRIO的嵌入式智能计算系统AI峰值算力达到114.9TOPS,能效比达到1.03TFLOPS/W,IO带宽达到20Gbps.在智能计算系统领域,其能效比优于国内其它已知同类板卡或系统,嵌入式环境适应能力优于传统台式机和服务器,可作为物端及边缘环境下AI计算任务的通用硬件加速平台.

 

关键词: 人工智能, 深度学习处理器, 嵌入式智能计算系统, 串行RapidIO, 能效比

Abstract: The existing artificial intelligent (AI) computing platform represented by x86+GPU, limited by power consumption, dimension, bandwidth, environmental adaptability, and other factors, cannot be well adapted to the things and edge intelligent computing scenarios. We proposed an embedded AI computing system based on ARM (Advanced RISC Machine) + DLP (Deep Learning Processor) + SRIO (Serial RapidIO), and elaborated the design methods and technical advantages. In study, three aspects of the system were dissertated: AI computing performance, power efficiency, and IO bandwidth, and the function and performance of the system were verified by experiments. The results show that the peak performance of the embedded AI computing system based on ARM+DLP+SRIO is up to 114.9TOPS, the energy efficiency is up to 1.03TFLOPS/W, and the IO bandwidth is up to 20Gbps. In the field of AI computing systems, its energy efficiency is better than other similar boards or systems in China, and its embedded environmental adaptability is better than that of traditional desktops and servers, so it can provide a general hardware acceleration platform for AI computing tasks in things and edge computing scenarios.

 

Key words: artificial intelligent, deep learning processor, embedded AI computing system, serial RapidIO, power efficiency

中图分类号: