An Improved Attention Mechanism Algorithm Model and Hardware Aceleration Design Method

WANG Ying; WANG Jing; GAO Lan; LÜ Xu; ZHANG Wei-gong

doi:10.12263/DZXB.20211229

您当前的位置：

首页 >

文章列表页 >

An Improved Attention Mechanism Algorithm Model and Hardware Aceleration Design Method

PAPERS | 更新时间：2025-12-08

- An Improved Attention Mechanism Algorithm Model and Hardware Aceleration Design Method
- ACTA ELECTRONICA SINICA Vol. 51, Issue 4, Pages: 1021-1029(2023)
- 作者机构：
  
  1.首都师范大学信息工程学院，北京 100048
  2.首都师范大学数学科学学院，北京 100048
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62076168)
- DOI：10.12263/DZXB.20211229
  CLC： TP302.8
- Received：06 September 2021，
  
  Revised：2021-12-27，
  
  Published：25 April 2023
- 稿件说明：
移动端阅览
王莹,王晶,高岚等.一种注意力机制优化方法及硬件加速设计[J].电子学报,2023,51(04):1021-1029.

WANG Ying,WANG Jing,GAO Lan,et al.An Improved Attention Mechanism Algorithm Model and Hardware Aceleration Design Method[J].ACTA ELECTRONICA SINICA,2023,51(04):1021-1029.
王莹,王晶,高岚等.一种注意力机制优化方法及硬件加速设计[J].电子学报,2023,51(04):1021-1029. DOI： 10.12263/DZXB.20211229.

WANG Ying,WANG Jing,GAO Lan,et al.An Improved Attention Mechanism Algorithm Model and Hardware Aceleration Design Method[J].ACTA ELECTRONICA SINICA,2023,51(04):1021-1029. DOI： 10.12263/DZXB.20211229.

摘要

针对注意力机制在卷积神经网络的应用过程中无法避免的计算量增大、延迟增加问题，本文提出一种优化后的CBAM（Convolutional Block Attention Module）算法模型，并进行了硬件设计实现.论文基于传统CBAM模型结构，分析算法内部隐藏的潜在问题，设计更加符合注意力重要性参数提取初衷的算法模型；同时，通过计算过程优化，减少数据计算量、对算子进行最大并行组合；利用FPGA（Field Programmable Gate Array）可设计高效灵活并行阵列的优势，为改进后的CBAM算法设计一种硬件加速引擎结构.实验结果表明，与传统CBAM机制相比，改进后的注意力机制可以保持与原有算法模型几乎相同的精度，部署在FPGA的硬件加速计算引擎以180 MHz工作频率进行推理实验，经分析可得，本文提出的设计方案在同等硬件资源条件下，针对注意力机制电路可实现10.2%的计算速度提升，针对VGG16网络模型可实现4.5%的推理速度提升.

Abstract

Aiming at the problem of increased calculation and delay that cannot be avoided in the application of convolutional neural network in the attention mechanism

this paper proposes an optimized CBAM (Convolutional Block Attention Module) algorithm model. Based on the traditional CBAM model structure

we analyze the hidden problems inside the algorithm

and design an algorithm model that is more fit for the original intention of attention importance parameter extraction; at the same time

through the optimization of the calculation process

the amount of data calculation is reduced

and the maximum parallel combination of operators is used; taking advantage of FPGA (Field Programmable Gate Array) to design efficient and flexible parallel arrays

we design a hardware acceleration engine structure for the improved CBAM algorithm. The experimental results show that compared with the traditional CBAM mechanism

the improved attention mechanism can maintain almost the same accuracy as the original algorithm model. The hardware accelerated computing engine deployed on the FPGA performs inference experiments at a working frequency of 180 MHz. After analysis

it can be found that the design proposed in this paper can achieve a 10.2% increase in calculation speed for the attention mechanism circuit and a 4.5% increase in inference speed for the VGG16 network model with the same hardware resources.

关键词

Keywords

references

HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 7132 - 7141 .

LI X , WANG W H , HU X L , et al . Selective kernel networks [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 510 - 519 .

SZEGEDY C , VANHOUCKE V , IOFFE S , et al . Rethinking the inception architecture for computer vision [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2016 : 2818 - 2826 .

WOO S , PARK J , LEE J Y , et al . CBAM: Convolutional block attention module [C ] // Computer Vision-ECCV 2018 . Cham : Springer International Publishing , 2018 : 3 - 19 .

PARK J , WOO S , LEE J Y , et al . BAM: Bottleneck attention module [EB/OL ] . ( 2018-07-17 )[ 2021-09 ] . https://arxiv.org/abs/1807.06514 https://arxiv.org/abs/1807.06514 .

GAO G S , LIU Q J , WANG Y H . Counting dense objects in remote sensing images [EB/OL ] . ( 2020-02-14 )[ 2021-09 ] . https://arxiv.org/abs/2002.05928 https://arxiv.org/abs/2002.05928 .

乔思波 , 庞善臣 , 王敏 , 等 . 基于残差混合注意力机制的脑部CT图像分类卷积神经网络模型 [J ] . 电子学报 , 2021 , 49 ( 5 ): 984 - 991 .

QIAO S B , PANG S C , WANG M , et al . A convolutional neural network for brain CT image classification based on residual hybrid attention mechanism [J ] . Acta Electronica Sinica , 2021 , 49 ( 5 ): 984 - 991 . (in Chinese)

WANG X L , GIRSHICK R , GUPTA A , et al . Non-local neural networks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 7794 - 7803 .

Vaswani A , Shazeer N , Parmar N , et al . Attention is all you need [EB/OL ] . ( 2017-06-12 )[ 2021-09 ] . https://arxiv.org/abs/1706.03762 https://arxiv.org/abs/1706.03762 .

BAHDANAU D , CHO K , BENGIO Y . Neural machine translation by jointly learning to align and translate [EB/OL ] . ( 2014-09-01 )[ 2021-09 ] . https://arxiv.org/abs/1409.0473 https://arxiv.org/abs/1409.0473 .

GEHRING J , AULI M , GRANGIER D , et al . Convolutional sequence to sequence learning [EB/OL ] . ( 2017-05-08 )[ 2021-09 ] . https://arxiv.org/abs/1705.03122 https://arxiv.org/abs/1705.03122 .

HAM T J , JUNG S J , KIM S , et al . A 3 : Accelerating attention mechanisms in neural networks with approximation [C ] // 2020 IEEE International Symposium on High Performance Computer Architecture . Piscataway : IEEE , 2020: 328 - 341 .

WANG H R , ZHANG Z K , HAN S . SpAtten: Efficient sparse attention architecture with cascade token and head pruning [EB/OL ] . ( 2020-12-17 )[ 2021-09 ] . https://arxiv.org/abs/2012.09852 https://arxiv.org/abs/2012.09852 .

HAN Y Z , HUANG G , SONG S J , et al . Dynamic neural networks: A survey [EB/OL ] . ( 2021-02-09 )[ 2021-09 ] . https://arxiv.org/abs/2102.04906 https://arxiv.org/abs/2102.04906 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2016 : 770 - 778 .

HE K M , ZHANG X Y , REN S Q , et al . Identity mappings in deep residual networks [C ] // European Conference on Computer Vision . Cham : Springer , 2016 : 630 - 645 .

XIE S N , GIRSHICK R , DOLLÁR P , et al . Aggregated residual transformations for deep neural networks [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2017 : 5987 - 5995 .

TOLSTIKHIN I , HOULSBY N , KOLESNIKOV A , et al . MLP-mixer: An all-MLP architecture for vision [EB/OL ] . ( 2021-05-04 )[ 2021-09 ] . https://arxiv.org/abs/2105.01601 https://arxiv.org/abs/2105.01601 .

GUO M H , LIU Z N , MU T J , et al . Beyond self-attention: External attention using two linear layers for visual tasks [EB/OL ] . ( 2021-05-05 )[ 2021-09 ] . https://arxiv.org/abs/2105.02358 https://arxiv.org/abs/2105.02358 .

DING X H , XIA C L , ZHANG X Y , et al . RepMLP: Re-parameterizing convolutions into fully-connected layers for image recognition [EB/OL ] . ( 2021-05-05 )[ 2021-09 ] . https://arxiv.org/abs/2105.01883 https://arxiv.org/abs/2105.01883 .

刘杰 , 葛一凡 , 田明 , 等 . 基于ZYNQ的可重构卷积神经网络加速器 [J ] . 电子学报 , 2021 , 49 ( 4 ): 729 - 735 .

LIU J , GE Y F , TIAN M , et al . Reconfigurable convolutional network accelerator based on ZYNQ [J ] . Acta Electronica Sinica , 2021 , 49 ( 4 ): 729 - 735 . (in Chinese)

乔瑞秀 , 陈刚 , 龚国良 , 等 . 一种高性能可重构深度卷积神经网络加速器 [J ] . 西安电子科技大学学报 , 2019 , 46 ( 3 ): 130 - 139 .

QIAO R X , CHEN G , GONG G L , et al . High performance reconfigurable accelerator for deep convolutional neural networks [J ] . Journal of Xidian University , 2019 , 46 ( 3 ): 130 - 139 . (in Chinese)

蹇强 , 张培勇 , 王雪洁 . 一种可配置的CNN协加速器的FPGA实现方法 [J ] . 电子学报 , 2019 , 47 ( 7 ): 1525 - 1531 .

JIAN Q , ZHANG P Y , WANG X J . An FPGA implementation method for configurable CNN co-accelerator [J ] . Acta Electronica Sinica , 2019 , 47 ( 7 ): 1525 - 1531 . (in Chinese)

PENG X Y , YU J X , YAO B W , et al . A review of FPGA-based custom computing architecture for convolutional neural network inference [J ] . Chinese Journal of Electronics , 2021 , 30 ( 1 ): 1 - 17 .

CHEN Y H , KRISHNA T , EMER J , et al . 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks [C ] // 2016 IEEE International Solid-State Circuits Conference . Piscataway : IEEE , 2016 : 262 - 263 .

CHEN Y H , YANG T J , EMER J , et al . Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices [J ] . IEEE Journal on Emerging and Selected Topics in Circuits and Systems , 2019 , 9 ( 2 ): 292 - 308 .

赵博雅 . 基于卷积神经网络的硬件加速器设计及实现研究 [D ] . 哈尔滨 : 哈尔滨工业大学 , 2018 .

ZHAO B Y . Study on Design and Implementation of Hardware Accelerators Based on Convolutional Neural Networks [D ] . Harbin : Harbin Institute of Technology , 2018 . (in Chinese)

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [EB/OL ] .( 2014-09-14 )[ 2021-09 ] . https://arxiv.org/abs/1409.1556 https://arxiv.org/abs/1409.1556 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Operator Fusion Method and Hardware Architecture Design Based on Non-Standard Operators

A Sentiment Classification Method for Text Comments Based on User Personality and Semantic-Structural Features

Image Super-Resolution Reconstruction Based on Lightweight Multi-Scale Channel Attention Network

Related Author

LV Xu

ZHANG Wei-gong

WANG Ying

GAO Lan

ZHANG Zhe

LIU Xin

WU Yi-xiong

ZHANG Wei-gong

Related Institution

College of Information Engineering, Capital Normal University

School of Mathematical Science, Capital Normal University

Faculty of Software Technologics, Shanxi Agricultural University

School of Information, Central University of Finance and Economics

School of Statistics, Tianjin University of Finance and Economics

⁰