基于核化双线性卷积网络的细粒度图像分类

doi:10.3969/j.issn.0372-2112.2019.10.015

PDF(3563 KB)

电子学报 ›› 2019, Vol. 47 ›› Issue (10) : 2134-2141. DOI: 10.3969/j.issn.0372-2112.2019.10.015

学术论文

基于核化双线性卷积网络的细粒度图像分类

葛疏雨, 高子淋, 张冰冰, 李培华

作者信息 +

Kernelized Bilinear CNN Models for Fine-Grained Visual Recognition

GE Shu-yu, GAO Zi-lin, ZHANG Bing-bing, LI Pei-hua

Author information +

文章历史 +

摘要

双线性卷积网络（Bilinear CNN，B-CNN）在计算机视觉任务中有着广泛的应用.B-CNN通过对卷积层输出的特征进行外积操作，能够建模不同通道之间的线性相关，从而增强了卷积网络的表达能力.由于没有考虑特征图中通道之间的非线性关系，该方法无法充分利用通道之间所蕴含的更丰富信息.为了解决这一不足，本文提出了一种核化的双线性卷积网络，通过使用核函数的方式有效地建模特征图中通道之间的非线性关系，进一步增强卷积网络的表达能力.本文在三个常用的细粒度数据库CUB-200-2011、FGVC-Aircraft以及Cars上对本文方法进行了验证，实验表明本文方法在三个数据库上均优于同类方法.

Abstract

The bilinear convolutional neural network(B-CNN) has been widely used in computer vision. B-CNN can capture the linear correlation between different channels by performing the outer product operation on the features of the convolutional layer output, thus enhancing the representative ability of the convolutional network. Since the non-linear relationship between the channels in the feature map is not taken account of, this method cannot make full use of the richer information contained between the channels. In order to solve this problem, this paper proposes a kernelized bilinear convolutional neural network employing the kernel function to effectively capture the non-linear relationship between the channels in the feature map, and further enhancing the representative ability of the convolutional network. In this paper, the method is evaluated on three common fine-grained benchmarks CUB-200-2011, FGVC-Aircraft and Cars. Experiments show that our method is superior to its counterparts on all three benchmarks.

导出引用

葛疏雨, 高子淋, 张冰冰, 李培华. 基于核化双线性卷积网络的细粒度图像分类[J]. 电子学报, 2019, 47(10): 2134-2141. https://doi.org/10.3969/j.issn.0372-2112.2019.10.015

GE Shu-yu, GAO Zi-lin, ZHANG Bing-bing, LI Pei-hua. Kernelized Bilinear CNN Models for Fine-Grained Visual Recognition[J]. Acta Electronica Sinica, 2019, 47(10): 2134-2141. https://doi.org/10.3969/j.issn.0372-2112.2019.10.015

中图分类号： TP391

参考文献

[1] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[A].Advances in Neural Information Processing Systems[C].Lake Tahoe:NIPS Foundation,2012.1097-1105.
[2] DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[A].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C].Florida:IEEE Press,2009.248-255.
[3] 柯圣财,赵永威,李弼程,等.基于卷积神经网络和监督核哈希的图像检索方法[J].电子学报,2017,45(1):157-163. KE Sheng-cai,ZHAO Yong-wei,LI Bi-cheng,et al.Image retrieval based on convolutional neural network and kernel-based supervised hashing[J].Acta Electronica Sinica,2017,45(1):157-163.(in Chinese)
[4] 王泽宇,吴艳霞,张国印,等.基于空间结构化推理深度融合网络的RGB-D场景解析[J].电子学报,2018,46(5):1253-1258. WANG Ze-yu,WU Yan-xia,ZHANG Guo-yin,et al.RGB-D scene parsing based on spatial structured inference deep fusion networks[J].Acta Electronica Sinica,2018,46(5):1253-1258.(in Chinese)
[5] 李康,李亚敏,胡学敏,等.基于卷积神经网络的鲁棒高精度目标跟踪算法[J].电子学报,2018,46(9):2087-2093. LI Kang,LI Ya-min,HU Xue-min,et al.Robust and accurate object tracking algorithm based on convolutional neural network[J].Acta Electronica Sinica,2018,46(9):2087-2093.(in Chinese)
[6] 邹承明,罗莹,徐晓龙.基于多特征组合的细粒度图像分类方法[J].计算机应用,2018,38(7):1853-1856,1861. ZOU Cheng-ming,LUO Ying,XU Xiao-long.Fine-grained image classification method based on multi-feature combination[J].Journal of Computer Applications,2018,38(7):1853-1856,1861.(in Chinese)
[7] LIN T Y,ROYCHOWDHURY A,MAJI S.Bilinear CNN models for fine-grained visual recognition[A].Proceedings of IEEE International Conference on Computer Vision[C].Santiago:IEEE Press,2015.1449-1457.
[8] LI P,XIE J,WANG Q,et al.Is second-order information helpful for large-scale visual recognition[A].Proceedings of IEEE International Conference on Computer Vision[C].Venice:IEEE Press,2017.2070-2078.
[9] LIN T Y,MAJI S.Improved bilinear pooling with CNNs[A].British Machine Vision Conference[C].London:British Machine Vision Association,2017.1-12.
[10] IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[A].International Conference on Machine Learning[C].Lille:ACM,2015.448-456.
[11] MAJI S,RAHTU E,KANNALA J,et al.Fine-Grained Visual Classification of Aircraft[OL].https://arxiv.org/abs/1306.5151,2013.
[12] WAH C,BRANSON S,WELINDER P,et al.The Caltech-Ucsd Birds-200-2011 Dataset[R].Technical report,Caltech.2011.
[13] KRAUSE J,STARK M,DENG J,et al.3D object representations for fine-grained categorization[A].Proceedings of IEEE International Conference on Computer Vision Workshops[C].Portland:IEEE Press,2013.554-561.
[14] GAO Y,BEIJBOM O,ZHANG N,et al.Compact bilinear pooling[A].Proceedings of IEEE Conference on Computer Vision and attern Recognition[C].Las Vegas:IEEE Press,2016.317-326
[15] LI Y,WANG N,LIU J,et al.Factorized bilinear models for image recognition[A].Proceedings of IEEE International Conference on Computer Vision[C].Venice:IEEE Press,2017.2098-2106.
[16] CUI Y,ZHOU F,WANG J,et al.Kernel pooling for convolutional neural networks[A].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C].Honolulu:IEEE Press,2017.3049-3058.
[17] WANG Q,LI P,ZHANG L.G2DeNet:Global Gaussian distribution embedding network and its application to visual recognition[A].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C].Honolulu:IEEE 2017.2730-2739.
[18] IONESCU C,VANTZOS O,SMINCHISESCU C.Training deep networks with structured layers by matrix backpropagation[OL].https://arxiv.org/abs/1509.07838,2015.
[19] LIN H T,LIN C J.A Study on Sigmoid Kernels for SVM and the Training of Non-PSD Kernels by SMO-Type Methods[R].Technical Report,Nat'l Taiwan Univ,2003.
[20] VEDALDI A,LENC K.Matconvnet:convolutional neural networks for matlab[A].ACM International Conference on Multimedia[C].Brisbane:ACM,2015.689-692.
[21] CHATFIELD K,SIMONYAN K,VEDALDI A,et al.Return of the devil in the details:Delving deep into convolutional nets[A].British Machine Vision Conference[C].Nottingham:British Machine Vision Association,2014.1-12.
[22] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[A].International Conference on Learning Representations[C].San Diego,2015.1-14.
[23] NAIR V,HINTON G E.Rectified linear units improve restricted boltzmann machines[A].International Conference on Machine Learning[C].Haifa:ACM,2010.807-814.
[24] GOU M,XIONG F,CAMPS O,et al.MoNet:Moments embedding network[A].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C].Salt Lake City:IEEE Press,2018.3175-3183.
[25] KONG S,FOWLKES C.Low-rank bilinear pooling for fine-grained classification[A].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C].Honolulu:IEEE Press,2017.7025-7034.
[26] FU J,ZHENG H,MEI T.Look closer to see better:Recurrent attention convolutional neural network for fine-grained image recognition[A].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C].Honolulu:IEEE Press,2017.4476-4484.
[27] MOGHIMI M,BELONGIE S J,SABERIAN M J,et al.Boosted convolutional neural networks[A].British Machine Vision Conference[C].York:British Machine Vision Association,2016.1-13.
[28] JADERBERG M,SIMONYAN K,ZISSERMAN A,et al.Spatial transformer networks[A].Advances in neural information processing systems[C].Montreal:MIT Press,2015.2017-2025
[29] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[A].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C].Las Vegas:IEEE Press,2016.770-778.
[30] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[A].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C].Boston:IEEE Press,2015.1-9.