

浏览全部资源
扫码关注微信
清华大学集成电路学院,北京 100084
Received:19 October 2023,
Revised:2024-04-03,
Published:25 April 2024
移动端阅览
李嘉宁, 姚鹏, 揭路, 等. 存算一体技术研究现状[J]. 电子学报, 2024, 52(04): 1103-1117.
LI Jia-ning, YAO Peng, JIE Lu, et al. Research Status of Computing-in-Memory Technology[J]. Acta Electronica Sinica, 2024, 52(04): 1103-1117.
李嘉宁, 姚鹏, 揭路, 等. 存算一体技术研究现状[J]. 电子学报, 2024, 52(04): 1103-1117. DOI:10.12263/DZXB.20230967
LI Jia-ning, YAO Peng, JIE Lu, et al. Research Status of Computing-in-Memory Technology[J]. Acta Electronica Sinica, 2024, 52(04): 1103-1117. DOI:10.12263/DZXB.20230967
冯诺依曼计算机体系结构面临着“存储墙”的瓶颈,阻碍AI(Artificial Intelligence)计算性能提升.存算一体硬件结构打破了“存储墙”的限制,大大提升了AI计算的性能.目前存算一体计算方案已在多种存储介质上得到实现,根据计算信号类型,可以将存算一体计算方案分成数字存算一体方案和模拟存算一体方案.存算一体硬件结构使得AI计算的性能取得巨大提升,然而进一步发展仍面临重大挑战.本文对不同信号域的存算一体方案的进行了对比分析,指出了每一种方案的主要优缺点,也指明了存算一体技术面临的挑战.我们认为,随着工艺集成、器件、电路、架构,软件工具链的跨层次协同研究发展,存算一体技术将在边缘端和云端,为AI计算提供更加强大和高效的算力.
Von Neumann computer architecture faces the bottleneck of “storage wall”
which hindering the performance improvement of AI (Artificial Intelligence) computing. Computing-In-Memory (CIM) breaks the limitation of “storage wall” and greatly improves the performance of AI computing. At present
CIM schemes have been implemented in a variety of storage media. According to the type of calculation signal
CIM scheme can be divided into digital CIM and analog CIM scheme. CIM has greatly improved the performance of AI computing
but the further development still faces major challenges. This article provides a detailed comparative analysis of CIM schemes in different signal domains
pointing out the main advantages and disadvantages of each scheme
and also pointing out the challenges faced by CIM. We believe that with the cross level collaborative research and development of process integration
devices
circuits
architecture
and software toolchains
CIM will provide more powerful and efficient computing power for AI computing at the edge and cloud ends.
KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks [J ] . Communications of the ACM , 2017 , 60 ( 6 ): 84 - 90 .
SENIOR A W , EVANS R , JUMPER J , et al . Improved protein structure prediction using potentials from deep learning [J ] . Nature , 2020 , 577 ( 7792 ): 706 - 710 .
HE K , ZHANG X , REN S , et al . Deep residual learning for image recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2016 : 770 - 778 .
SONG X , ZOU Y X , HUANG S L , et al . Investigating multi-task learning for automatic speech recognition with code-switching between mandarin and English [C ] // 2017 International Conference on Asian Language Processing (IALP) . Piscataway : IEEE , 2017 : 27 - 30 .
ASIF-UR-RAHMAN M , AFSANA F , MAHMUD M , et al . Toward a heterogeneous mist, fog, and cloud-based framework for the internet of healthcare things [J ] . IEEE Internet of Things Journal , 2019 , 6 ( 3 ): 4049 - 4062 .
ABBAS N , ZHANG Y , TAHERKORDI A , et al . Mobile edge computing: A survey [J ] . IEEE Internet of Things Journal , 2018 , 5 ( 1 ): 450 - 465 .
WULF W A , MCKEE S A . Hitting the memory wall [J ] . ACM SIGARCH Computer Architecture News , 1995 , 23 ( 1 ): 20 - 24 .
ZHANG D P , JAYASENA N , LYASHEVSKY A , et al . TOP-PIM: Throughput-oriented programmable processing in memory [C ] // Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing . New York : ACM , 2014 : 85 - 98 .
AHN J , HONG S , YOO S , et al . A scalable processing-in-memory accelerator for parallel graph processing [C ] // Proceedings of the 42nd Annual International Symposium on Computer Architecture . New York : ACM , 2015 : 105 - 117 .
JIANG Z W , YIN S H , SEO J S , et al . C3SRAM: An In-memory-computing SRAM macro based on robust capacitive coupling computing mechanism [J ] . IEEE Journal of Solid-State Circuits , 2020 , 55 ( 7 ): 1888 - 1897 .
SI X , TU Y N , HUANG W H , et al . 15.5 A 28nm 64Kb 6T SRAM computing-in-memory macro with 8b MAC operation for AI edge chips [C ] // 2020 IEEE International Solid-State Circuits Conference - (ISSCC) . Piscataway : IEEE , 2020 : 246 - 248 .
WU P C , SU J W , CHUNG Y L , et al . A 28nm 1Mb time-domain computing-in-memory 6T-SRAM macro with a 6.6ns latency, 1241GOPS and 37.01TOPS/W for 8b-MAC operations for edge-AI devices [C ] // 2022 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2022 : 1 - 3 .
CHIH Y D , LEE P H , FUJIWARA H , et al . 16 . 4 an 89TOPS/W and 16.3TOPS/mm 2 all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications [C ] // 2021 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2021: 252 - 254 .
GUO A , SI X , CHEN X , et al . A 28nm 64-Kb 31.6-TFLOPS/W digital-domain floating-point-computing-unit and double-bit 6T-SRAM computing-in-memory macro for floating-point CNNs [C ] // 2023 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2023 : 128 - 130 .
HE Y F , DIAO H K , TANG C , et al . 7.3 A 28nm 38-to-102-TOPS/W 8b multiply-less approximate digital SRAM compute-In-memory macro for neural-network inference [C ] // 2023 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2023 : 130 - 132 .
WU P C , SU J W , HONG L Y , et al . A 22nm 832Kb hybrid-domain floating-point SRAM in-memory-compute macro with 16.2-70.2TFLOPS/W for high-accuracy AI-edge devices [C ] // 2023 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2023 : 126 - 128 .
KIM S , KIM S , UM S , et al . A reconfigurable 1T1C eDRAM-based spiking neural network computing-in-memory processor for high system-level efficiency [C ] // 2023 IEEE International Symposium on Circuits and Systems (ISCAS) . Piscataway : IEEE , 2023 : 1 - 5 .
XIE S S , NI C , SAYAL A , et al . 16.2 eDRAM-CIM: Compute-in-memory design with reconfigurable embedded-dynamic-memory array realizing adaptive data converters and charge-domain computing [C ] // 2021 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2021 : 248 - 250 .
ZHAO Y S , SHEN Z X , XU J R , et al . A novel transpose 2T-DRAM based computing-in-memory architecture for on-chip DNN training and inference [C ] // 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) . Piscataway : IEEE , 2023 : 1 - 4 .
YU C S , YOO T , KIM H , et al . A logic-compatible eDRAM compute-in-memory with embedded ADCs for processing neural networks [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2021 , 68 ( 2 ): 667 - 679 .
JUN Z . Flash memory technology development [C ] // 2001 6th International Conference on Solid-State and Integrated Circuit Technology. Proceedings (Cat. No.01EX443) . Piscataway : IEEE , 2001 : 189 - 194 .
HAN R Z , XIANG Y C , HUANG P , et al . Flash memory array for efficient implementation of deep neural networks [J ] . Advanced Intelligent Systems , 2021 , 3 ( 5 ): 2000161 .
LI X Y , WU H Q , GAO B , et al . Electrode-induced digital-to-analog resistive switching in TaOx-based RRAM devices [J ] . Nanotechnology , 2016 , 27 ( 30 ): 305201 .
WONG H S P , RAOUX S , KIM S , et al . Phase change memory [J ] . Proceedings of the IEEE , 2010 , 98 ( 12 ): 2201 - 2227 .
APALKOV D , DIENY B , SLAUGHTER J M . Magnetoresistive random access memory [J ] . Proceedings of the IEEE , 2016 , 104 ( 10 ): 1796 - 1830 .
ZHANG Y Z , WU H Q , QIAN H , et al . An improved RRAM-based binarized neural network with high variation-tolerated forward/backward propagation module [J ] . IEEE Transactions on Electron Devices , 2020 , 67 ( 2 ): 469 - 473 .
YAO P , WU H Q , GAO B , et al . Fully hardware-implemented memristor convolutional neural network [J ] . Nature , 2020 , 577 ( 7792 ): 641 - 646 .
KHWA W S , CHIU Y C , JHANG C J , et al . A 40-nm, 2M-cell, 8b-precision, hybrid SLC-MLC PCM computing-in-memory macro with 20.5 - 65.0TOPS/W for tiny-Al edge devices [C ] // 2022 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2022 : 1 - 3 .
FREDEMAN G , PLASS D , MATHEWS A , et al . 17.4 A 14nm 1.1Mb embedded DRAM macro with 1ns access [C ] // 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers . Piscataway : IEEE , 2015 : 1 - 3 .
SEBASTIAN A , LE GALLO M , KHADDAM-ALJAMEH R , et al . Memory devices and applications for in-memory computing [J ] . Nature Nanotechnology , 2020 , 15 : 529 - 544 .
WONG H S P , LEE H Y , YU S M , et al . Metal-oxide RRAM [J ] . Proceedings of the IEEE , 2012 , 100 ( 6 ): 1951 - 1970 .
FONG X , KIM Y , VENKATESAN R , et al . Spin-transfer torque memories: Devices, circuits, and systems [J ] . Proceedings of the IEEE , 2016 , 104 ( 7 ): 1449 - 1488 .
CHIU Y C , YANG C S , TENG S H , et al . A 22nm 4Mb STT-MRAM data-encrypted near-memory computation macro with a 192GB/s read-and-decryption bandwidth and 25.1-55.1TOPS/W 8b MAC for AI operations [C ] // 2022 IEEE International Solid- State Circuits Conference (ISSCC) . Piscataway : IEEE , 2022 : 178 - 180 .
JEONG S , PARK J , JEON D . A 28nm 1.644TFLOPS/W floating-point computation SRAM macro with variable precision for deep neural network inference and training [C ] // ESSCIRC 2022 IEEE 48th European Solid State Circuits Conference (ESSCIRC) . Piscataway : IEEE , 2022 : 145 - 148 .
TU F B , WANG Y Q , WU Z H , et al . A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 reconfigurable digital CIM processor with unified FP/INT pipeline and bitwise In-memory booth multiplication for cloud deep learning acceleration [C ] // 2022 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2022 : 1 - 3 .
LIU Q , GAO B , YAO P , et al . 33.2 A fully integrated analog ReRAM based 78.4TOPS/W compute-In-memory chip with fully parallel MAC computing [C ] // 2020 IEEE International Solid-State Circuits Conference - (ISSCC) . Piscataway : IEEE , 2020 : 500 - 502 .
HSIEH S E , WEI C H , XUE C X , et al . 7.6 A 70.85-86.27TOPS/W PVT-insensitive 8b word-wise ACIM with post-processing relaxation [C ] // 2023 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2023 : 136 - 138 .
CHEN P Y , WU M , ZHAO W T , et al . 7.8 A 22nm delta-sigma computing-in-memory (Δ∑CIM) SRAM macro with near-zero-mean outputs and LSB-first ADCs achieving 21.38TOPS/W for 8b-MAC edge AI processing [C ] // 2023 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2023 : 140 - 142 .
HUNG J M , HUANG Y H , HUANG S P , et al . An 8-Mb DC-current-free binary-to-8b precision ReRAM nonvolatile computing-in-memory macro using time-space-readout with 1286.4-21.6TOPS/W for edge-AI Devices [C ] // 2022 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2022 : 1 - 3 .
BISWAS A , CHANDRAKASAN A P . Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications [C ] // 2018 IEEE International Solid -State Circuits Conference - (ISSCC) . Piscataway : IEEE , 2018 : 488 - 490 .
CHEN W H , LI K X , LIN W Y , et al . A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors [C ] // 2018 IEEE International Solid-State Circuits Conference - (ISSCC) . Piscataway : IEEE , 2018 : 494 - 496 .
XUE C X , CHEN W H , LIU J S , et al . 24.1 A 1Mb multibit ReRAM computing-In-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors [C ] // 2019 IEEE International Solid-State Circuits Conference - (ISSCC) . Piscataway : IEEE , 2019 : 388 - 390 .
SI X , CHEN J J , TU Y N , et al . 24.5 A twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning [C ] // 2019 IEEE International Solid-State Circuits Conference - (ISSCC) . Piscataway : IEEE , 2019 : 396 - 398 .
YANG J , KONG Y Y , WANG Z , et al . 24.4 sandwich-RAM: An energy-efficient in-memory BWN architecture with pulse-width modulation [C ] // 2019 IEEE International Solid-State Circuits Conference - (ISSCC) . Piscataway : IEEE , 2019 : 394 - 396 .
SU J W , SI X , CHOU Y C , et al . 15.2 A 28nm 64Kb inference-training two-way transpose multibit 6T SRAM compute-in-memory macro for AI edge chips [C ] // 2020 IEEE International Solid-State Circuits Conference - (ISSCC) . Piscataway : IEEE , 2020 : 240 - 242 .
XUE C X , HUANG T Y , LIU J S , et al . 15.4 A 22nm 2Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices [C ] // 2020 IEEE International Solid-State Circuits Conference - (ISSCC) . Piscataway : IEEE , 2020 : 244 - 246 .
XUE C X , HUNG J M , KAO H Y , et al . 16.1 A 22nm 4Mb 8b-precision ReRAM computing-in-memory macro with 11.91 to 195.7TOPS/W for tiny AI edge devices [C ] // 2021 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2021 : 245 - 247 .
SU J W , CHOU Y C , LIU R H , et al . 16.3 A 28nm 384Kb 6T-SRAM computation-in-memory macro with 8b precision for AI edge chips [C ] // 2021 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2021 : 250 - 252 .
YAN B N , HSU J L , YU P C , et al . A 1.041-mb/mm2 27.38-TOPS/W signed-INT8 dynamic-logic-based ADC-less SRAM compute-in-memory macro in 28nm with reconfigurable bitwise operation for AI and embedded applications [C ] // 2022 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2022 : 188 - 190 .
GONUGONDLA S K , KANG M G , SHANBHAG N . A 42pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training [C ] // 2018 IEEE International Solid-State Circuits Conference - (ISSCC) . Piscataway : IEEE , 2018 : 490 - 492 .
JHANG C J , XUE C X , HUNG J M , et al . Challenges and trends of SRAM-based computing-in-memory for AI edge devices [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2021 , 68 ( 5 ): 1773 - 1786 .
GUO R Q , LIU Y G , ZHENG S X , et al . A 5.1pJ/Neuron 127.3us/Inference RNN-based speech recognition processor using 16 computing-in-memory SRAM macros in 65nm CMOS [C ] // 2019 Symposium on VLSI Circuits . Piscataway : IEEE , 2019 : C120 - C121 .
AGRAWAL A , KOSTA A , KODGE S , et al . CASH-RAM: Enabling in-memory computations for edge inference using charge accumulation and sharing in standard 8T-SRAM arrays [J ] . IEEE Journal on Emerging and Selected Topics in Circuits and Systems , 2020 , 10 ( 3 ): 295 - 305 .
YUE J S , YUAN Z , FENG X Y , et al . 14.3 A 65nm computing-in-memory-based CNN processor with 2.9-to-35.8TOPS/W system energy efficiency using dynamic-sparsity performance-scaling architecture and energy-efficient inter/intra-macro data reuse [C ] // 2020 IEEE International Solid-State Circuits Conference - (ISSCC) . Piscataway : IEEE , 2020 : 234 - 236 .
NGUYEN V T , KIM J S , LEE J W . 10T SRAM computing-in-memory macros for binary and multibit MAC operation of DNN edge processors [J ] . IEEE Access , 2021 , 9 : 71262 - 71276 .
SU J W , SI X , CHOU Y C , et al . Two-way transpose multibit 6T SRAM computing-in-memory macro for inference-training AI edge chips [J ] . IEEE Journal of Solid-State Circuits , 2022 , 57 ( 2 ): 609 - 624 .
BISHNOI R , EBRAHIMI M , OBORIL F , et al . Read disturb fault detection in STT-MRAM [C ] // 2014 International Test Conference . Piscataway : IEEE , 2014 : 1 - 7 .
PIROVANO A , REDAELLI A , PELLIZZER F , et al . Reliability study of phase-change nonvolatile memories [J ] . IEEE Transactions on Device and Materials Reliability , 2004 , 4 ( 3 ): 422 - 427 .
CAI Y , LUO Y X , GHOSE S , et al . Read disturb errors in MLC NAND flash memory: Characterization, mitigation, and recovery [C ] // 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks . Piscataway : IEEE , 2015 : 438 - 449 .
SI X , TU Y N , HUANG W H , et al . A Local Computing Cell and 6T SRAM-Based Computing-in-Memory Macro With 8-b MAC Operation for Edge AI Chips [J ] . IEEE Journal of Solid-State Circuits , 2021 , 56 ( 9 ): 2817 - 2831 .
LI H T , JIANG Z Z , HUANG P , et al . Statistical assessment methodology for the design and optimization of cross-point RRAM arrays [C ] // 2014 IEEE 6th International Memory Workshop (IMW) . Piscataway : IEEE , 2014 : 1 - 4 .
YU S M , SHIM W , PENG X C , et al . RRAM for compute-in-memory: From inference to training [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2021 , 68 ( 7 ): 2753 - 2765 .
YIN S H , JIANG Z W , SEO J S , et al . XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks [J ] . IEEE Journal of Solid-State Circuits , 2020 : 1 - 11 .
KIM Y , KIM H , PARK J , et al . Mapping binary ResNets on computing-in-memory hardware with low-bit ADCs [C ] // 2021 Design , Automation & Test in Europe Conference & Exhibition (DATE) . Piscataway : IEEE , 2021 : 856 - 861 .
WANG H C , LIU R Z , DORRANCE R , et al . A charge domain SRAM compute-in-memory macro with C-2C ladder-based 8-bit MAC unit in 22-nm FinFET process for edge inference [J ] . IEEE Journal of Solid-State Circuits , 2023 , 58 ( 4 ): 1037 - 1050 .
YU S M , JIANG H W , HUANG S S , et al . Compute-in-memory chips for deep learning: Recent trends and prospects [J ] . IEEE Circuits and Systems Magazine , 2021 , 21 ( 3 ): 31 - 56 .
SHIM W , MENG J , PENG X C , et al . Impact of multilevel retention characteristics on RRAM based DNN inference engine [C ] // 2021 IEEE International Reliability Physics Symposium (IRPS) . Piscataway : IEEE , 2021 : 1 - 4 .
DONG Q , WANG Z H , LIM J , et al . A 1Mb 28nm STT-MRAM with 2 . 8ns read access time at 1 . 2V VDD using single-cap offset-cancelled sense amplifier and in situ self-write-termination [C ] // 2018 IEEE International Solid - State Circuits Conference - (ISSCC) . Piscataway : IEEE , 2018: 480 - 482 .
AMBROGIO S , NARAYANAN P , OKAZAKI A , et al . An analog-AI chip for energy-efficient speech recognition and transcription [J ] . Nature , 2023 , 620 ( 7975 ): 768 - 775 .
NABAVINEJAD S M , BAHARLOO M , CHEN K C , et al . An overview of efficient interconnection networks for deep neural network accelerators [J ] . IEEE Journal on Emerging and Selected Topics in Circuits and Systems , 2020 , 10 ( 3 ): 268 - 282 .
BJERREGAARD T , MAHADEVAN S . A survey of research and practices of network-on-chip [J ] . ACM , 2006 , 38 ( 1 ): 1 - 51 .
HEMMATI M R , DOLATSHAHI M , MEHRBOD A . Increasing the efficiency of NOC routing algorithms based on fault tolerance measurement method [C ] // 2018 International Young Engineers Forum (YEF-ECE) . Piscataway : IEEE , 2018 : 31 - 38 .
SMAGULOVA K , FOUDA M E , KURDAHI F , et al . Resistive neural hardware accelerators [J ] . Proceedings of the IEEE , 2023 , 111 ( 5 ): 500 - 527 .
ZHANG W Q , GAO B , TANG J S , et al . Neuro-inspired computing chips [J ] . Nature Electronics , 2020 , 3 : 371 - 382 .
FEI X , ZHANG Y H , ZHENG W M . XB-SIM: A simulation framework for modeling and exploration of ReRAM-based CNN acceleration design [J ] . Tsinghua Science and Technology , 2021 , 26 ( 3 ): 322 - 334 .
KRISHNAN G , MANDAL S K , CHAKRABARTI C , et al . System-level benchmarking of chiplet-based IMC architectures for deep neural network acceleration [C ] // 2021 IEEE 14th International Conference on ASIC (ASICON) . Piscataway : IEEE , 2021 : 1 - 4 .
0
Views
51
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621