1.天津大学智能与计算学部,天津 300350
2.中汽数据(天津)有限公司,天津 300380
[ "伍邦谷 男,1996年生于湖北省.硕士研究生,计算机科学与技术专业,研究方向为计算机视觉. E-mail: wubanggu@tju.edu.cn" ]
[ "张苏林 男,1987年生于辽宁省.中汽数据(天津)有限公司高级技术经理,主要研究方向为智能网联汽车、自动驾驶控制及感知系统. E-mail: zhangsulin@catarc.ac.cn" ]
[ "石 红 女,1975年生于内蒙古自治区.副教授,硕士生导师.主要研究方向为粗糙集、机器学习. E-mail: serena@tju.edu.cn" ]
[ "朱鹏飞 男,1986年生于河南省.副教授,博士生导师.主要研究方向为无人机视觉、人机协同学习、度量学习. E-mail: zhupengfei@tju.edu.cn" ]
[ "王旗龙(通讯作者) 男,1989年生于黑龙江省.副教授,硕士生导师.主要研究方向为视频图像分析、深度学习等. E-mail: qlwang@tju.edu.cn" ]
[ "胡清华 男,1976年生于湖南省.教授,博士生导师.主要研究方向为数据不确定性建模、多模态数据学习. E-mail: huqinghua@tju.edu.cn" ]
收稿:2020-10-28,
修回:2021-01-25,
纸质出版:2022-02-25
移动端阅览
伍邦谷,张苏林,石红等.基于多分支结构的不确定性局部通道注意力机制[J].电子学报,2022,50(02):374-382.
WU Bang-gu,ZHANG Su-lin,SHI Hong,et al.Multi-Branch Structure Based Local Channel Attention with Uncertainty[J].ACTA ELECTRONICA SINICA,2022,50(02):374-382.
伍邦谷,张苏林,石红等.基于多分支结构的不确定性局部通道注意力机制[J].电子学报,2022,50(02):374-382. DOI: 10.12263/DZXB.20201204.
WU Bang-gu,ZHANG Su-lin,SHI Hong,et al.Multi-Branch Structure Based Local Channel Attention with Uncertainty[J].ACTA ELECTRONICA SINICA,2022,50(02):374-382. DOI: 10.12263/DZXB.20201204.
近几年的研究表明视觉注意力机制是提升深层卷积神经网络性能的有效途径.然而,现有的视觉注意力方法更多地致力于建模所有卷积通道之间的相关性,在一定程度上限制了模型的计算效率.此外,这些方法尚未明确考虑相关性建模过程中不确定性带来的影响,缺少对注意力机制在泛化能力和稳定性方面的探索.为解决上述问题,提出了一种多分支局部通道注意力模块(Multi-Branch Local Channel Attention,MBLCA).通过建模通道之间的局部相关性学习各个通道的权重,提升了模型的计算效率.并采用蒙特卡洛(Monte Carlo,MC)Dropout近似的深度贝叶斯学习方法对局部通道注意力模块进行不确定性建模,从而得到一个多分支的局部通道注意力模块.提出的MBLCA模块可以灵活地应用于各种深层卷积神经网络架构中,与同类型的工作相比,嵌入MBLCA模块的ResNet-50网络结构在ImageNet-1K和MS COCO数据集上分别取得了2.58%的分类精度提升和1.9%的AP提升.
Recent researches demonstrate that attention mechanism is an effective way to improve performance of deep convolution neural networks(CNNs). However
most of existing attention methods more dedicate to modeling the correlation between all channels
which limits the computational efficiency of the model. In addition
these methods have not considered the impact of uncertainty in the correlation modeling process
and lack the exploration of the generalization ability and stability of the attention mechanism. A multi-branch local channel attention(MBLCA) module is proposed to handle above issues. MBLCA learns channel attention by capturing correlation across channels in a local range instead of global ones
improving the computational efficiency
and models the uncertainty of local channel attention by deep Bayesian learning
which is approximated by Monte Carlo(MC) Dropout
leading a multi-branch structure. The proposed MBLCA can be flexibly adopted to various deep CNN architectures. For example
ResNet-50 with the MBLCA module has achieved 2.58% improvement in classification accuracy and 1.9% improvement in average precise on the ImageNet-1K and MS COCO datasets against state-of-the-art counterparts.
YOSHUA B , AARON C , PASCAL V . Representation learning: A review and new perspectives [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2013 , 35 ( 8 ): 1798 - 1828 .
KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks [C]// Advances in Neural Information Processing Systems . Lake Tahoe, Ne : MIT Press , 2012 : 1097 - 1105 .
DENG J , DONG W , SOCHER R , LI L , LI K , LI F F . ImageNet: A large-scale hierarchical image database [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Miami, FL, USA : IEEE , 2009 : 248 - 255 .
SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [J]. arXiv: 1409.1556 , 2015 .
SZEGEDY C , LIU W , JIA Y , et al . Going deeper with convolutions [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Boston, MA, USA : IEEE , 2015 : 1 - 9 . DOI: 10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 .
HE K , ZHANG X , REN S , SUN J . Deep residual learning for image recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas, NV, USA : IEEE , 2016 : 770 - 778 .
HUANG G , LIU Z , MAATEN L , WEINBERGER K Q . Densely connected convolutional networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, HI, USA : IEEE , 2017 : 2261 - 2269 .
HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, UT, USA : IEEE , 2018 : 18 - 23 .
WOO S , PARK J , LEE J , KWEON I S . CBAM: Convolutional block attention module [C]// European Conference on Computer Vision . Munich, Germany : Springer , 2018 : 3 - 19 .
BELLO I , ZOPH B , VASWANI A , SHLENS J , LE Q . Attention augmented convolutional networks [C]// Proceedings of the IEEE International Conference on Computer Vision . Seoul, Korea (South) : IEEE , 2019 : 3285 - 3294 .
CHEN Y , KALANTIDIS Y , LI J , YAN S , FENG J . A 2 -Nets: Double attention networks [C]// Advances In Neural Information Processing Systems . Montreal, Canada : MIT Press , 2018 : 352 - 361 .
FU J , LIU J , TIAN H , LI Y , BAO Y , FANG Z , LU H . Dual attention network for scene segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach, CA, USA : IEEE , 2019 : 3146 - 3154 .
GAO Z , XIE J , WANG Q , LI P . Global second-order pooling convolutional networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach, CA, USA : IEEE , 2019 : 3024 - 3033 .
WANG X , GIRSHICK R , GUPTA A , HE K . Non-local neural networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, UT, USA : IEEE , 2018 : 7794 - 7803 .
WANG Q , WU B , ZHU P , LI P , ZUO W , HU Q . ECA-Net: Efficient channel attention for deep convolutional neural networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Seattle, WA, USA : IEEE , 2020 : 11531 - 11539 .
DAVID J C MACKAY . A practical Bayesian framework for backpropagation networks [J]. Neural Computation . 1992 , 4 ( 3 ): 448 - 472 .
GRAVES A . Practical variational inference for neural networks [C]// Advances in Neural Information Processing Systems . Red Hook, USA : MIT Press , 2011 : 2348 - 2356 .
GAL Y , GHAHRAMANI Z . Dropout as a Bayesian approximation: Representing model uncertainty in deep learning [C]// International Conference on Machine Learning . New York, NY, USA : ACM , 2016 : 1050 - 1059 .
SRIVASTAVA N , HINTON G E , KRIZHEVSKY A , SUTSKEVER I , SALAKHUTDINOV R . Dropout: A simple way to prevent neural networks from overfitting [J]. The Journal of Machine Learning Research , 2014 , 15 ( 1 ): 1929 - 1958 .
LIN T , MAIRE M , BELONGIE S J , HAYS J , PERONA P , RAMANAN D , DOLL'AR P , ZITNICK C L . Microsoft COCO: Common objects in context [C]// European Conference on Computer Vision . Munich, Germany : Springer , 2014 : 8693 : 740 - 755 .
仇祝令 , 查宇飞 , 吴敏 , 等 . 基于注意力学习的正则化相关滤波跟踪算法 [J]. 电子学报 , 2020 , 48 ( 9 ): 1762 - 1768 .
QIU Z L , ZHA Y F , WU M , et al . Learning attentional regularized correlation filter for visual tracking [J]. Acta Electronica Sinica , 2020 , 48 ( 9 ): 1762 - 1768 . (in Chinese)
QIN P , SHEN W , ZENG J . DSCA-Net: Indoor head detection network using dual-stream information and channel attention [J]. Chinese Journal of Electronics , 2020 , 29 ( 6 ): 1102 - 1109 .
盖杉 , 王俊生 . 基于深度学习的非局部注意力增强网络图像去雨算法研究 [J]. 电子学报 , 2020 , 48 ( 10 ): 1899 - 1908 .
GAI S , WANG J S . Image raindrop algorithm research using nonlocal attention enhanced network based on deep learning [J]. Acta Electronica Sinica , 2020 , 48 ( 10 ): 1899 - 1908 . (in Chinese)
CAO Y , XU J , LIN S , WEI F , HU H . GCnet: Non-local networks meet squeeze-excitation networks and beyond [C]// International Conference on Computer Vision Workshops . Seoul, Korea(South) : IEEE , 2019 : 1971 - 1980 .
PAISLEY J W , BLEI D M , JORDAN M I . Variational Bayesian inference with stochastic search [C]// International Conference on Machine Learning . Edinburgh : ACM , 2012 .
KINGMA D P , WELLING M . Auto-encoding variational Bayes [C]// International Conference on Learning Representations . Banff, AB, Canada : IEEE , 2014 : 1 - 14 .
HOFFMAN M D , B LEI D M , WANG C , PAISLEY J W . Stochastic variational inference [J]. The Journal of Machine Learning Research . 2013 , 14 ( 1 ): 1303 - 1347 .
齐现英 , 刘伯强 , 徐建伟 . 基于不确定性信息融合的高密度椒盐噪声降噪方法 [J]. 电子学报 , 2016 , 44 ( 4 ): 878 - 885 .
QI X Y , LIU B Q , XU J W . A novel algorithm for removing high-density salt-and-pepper noise based on fusion of indeterminacy information [J]. Acta Electronica Sinica , 2016 , 44 ( 4 ): 878 - 885 . (in Chinese)
张挺 , 刘金华 . 一种新的空间数据不确定性重建方法 [J]. 电子学报 , 2018 , 46 ( 3 ): 641 - 645 .
ZHANG T , LIU J H . A New indefinite reconstruction method for spatial data [J]. Acta Electronica Sinica , 2018 , 46 ( 3 ): 641 - 645 . (in Chinese)
TAKASHI Y , SHINGO M , KOTARO H , et al . Recurrent neural networks with multi-branch structure [J]. IEEJ Transactions on Electronics Information & Systems , 2007 , 127 ( 9 ): 1430 - 1435 .
CHEN Y , LI J , XIAO H , JIN X , YAN S , FENG J . Dual path networks [C]// Advances in Neural Information Processing Systems . Long Beach : MIT Press , 2017 : 4467 - 4475 .
NAIR V , HINTON G E . Rectified linear units improve restricted Boltzmann machines [C]// International Conference on Machine Learning . Haifa : ACM , 2010 : 807 - 814 .
HE K , GKIOXARI G , DOLL'AR P , GIRSHICK R B . Mask R-CNN [C]// 2017 IEEE International Conference on Computer Vision(ICCV) . Venice, Italy : IEEE . 2017 : 2980 - 2988 .
DAN H , THOMAS G D . Benchmarking neural network robustness to common corruptions and perturbations [C]// International Conference on Learning Representation . New Orleans, LA, USA : IEEE , 2019 .
LIN T , DOLL'AR P , GIRSHICK R B , HE K , HARIHARAN B , BELONGIE S J . Feature pyramid networks for object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, HI, USA : IEEE , 2017 : 936 - 944 .
HE K , ZHANG X , REN S , SUN J . Identity mappings in deep residual networks [C]// European Conference on Computer Vision . Amsterdam, Netherlands : Springer , 2016 : 630 - 645 .
SZEGEDY C , VANHOUCKE V , IOFFE S , SHLENS J , WOJNA Z . Rethinking the inception architecture for computer vision [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas, NV, USA : IEEE , 2016 : 2818 - 2826 .
SANDLER M , HOWARD A G , ZHU M , ZHMOGINOV A , CHEN L . Mobilenetv2: Inverted residuals and linear bottlenecks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, UT, USA : IEEE , 2018 : 4510 - 4520 .
SZEGEDY C , IOFFE S , VANHOUCKE V , ALEMI A A . Inception-v4, inception-resnet and the impact of residual connections on learning [C]// Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence . San Francisco, USA : AAAI Press , 2017 : 4278 - 4284 .
0
浏览量
16
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621