Multi-Layer Focused Inception-V3 Models for Fine-Grained Visual Recognition

WANG Bo; HUANG Mian; LIU Li-jun; HUANG Qing-song; SHAN Wen-qi

doi:10.12263/DZXB.20200443

您当前的位置：

首页 >

文章列表页 >

Multi-Layer Focused Inception-V3 Models for Fine-Grained Visual Recognition

PAPERS | 更新时间：2025-12-08

- Multi-Layer Focused Inception-V3 Models for Fine-Grained Visual Recognition
- ACTA ELECTRONICA SINICA Vol. 50, Issue 1, Pages: 72-78(2022)
- 作者机构：
  
  1.昆明理工大学信息工程与自动化学院，云南昆明 650500
  2.云南国土资源职业学院信息中心，云南昆明 652501
  3.云南大学信息学院，云南昆明 650091
  4.云南省计算机技术应用重点实验室，云南昆明 650500
- 作者简介：
- 基金信息：
- DOI：10.12263/DZXB.20200443
  CLC： TP391;
- Received：09 May 2020，
  
  Revised：2020-10-10，
  
  Published：25 January 2022
- 稿件说明：
移动端阅览
王波,黄冕,刘利军等.基于多层聚焦Inception-V3卷积网络的细粒度图像分类[J].电子学报,2022,50(01):72-78.

WANG Bo,HUANG Mian,LIU Li-jun,et al.Multi-Layer Focused Inception-V3 Models for Fine-Grained Visual Recognition[J].ACTA ELECTRONICA SINICA,2022,50(01):72-78.
王波,黄冕,刘利军等.基于多层聚焦Inception-V3卷积网络的细粒度图像分类[J].电子学报,2022,50(01):72-78. DOI： 10.12263/DZXB.20200443.

WANG Bo,HUANG Mian,LIU Li-jun,et al.Multi-Layer Focused Inception-V3 Models for Fine-Grained Visual Recognition[J].ACTA ELECTRONICA SINICA,2022,50(01):72-78. DOI： 10.12263/DZXB.20200443.

摘要

细粒度图片具有结构多变、背景干扰大、类间差异小、类内差异大等特点，准确地定位与提取判别性局部特征至关重要.本文提出一种多层聚焦卷积网络，通过首层聚焦网络能够准确、有效地聚焦于识别局域并生成定位区域，根据定位区域对原图像分别进行裁剪和遮挡后输入下一层的聚焦网络进行训练分类.其中单层聚焦网络以Inception-V3网络为基础，通过卷积块特征注意力模块和定位区域选择机制来聚焦有效的定位区域；使用双线性注意力最大池化提取各个局部的特征；最后进行分类预测.本文在3个常用的细粒度数据集CUB-2011、FGVC-Aircraft以及Stanford Cars上进行了实验验证，分别获得了89.7%、93.6%和95.1%的Top-1准确率.实验结果表明，本模型的分类准确率高于目前主流方法.

Abstract

Fine-grained pictures are characterized by variable structure

large background interference

small inter-class difference and large intra-class difference

so accurate positioning and extraction of discriminant local features are crucial. In this paper

a multi-layer focused convolution network is proposed

which can accurately and effectively focus on identifying local areas and generating locating regions through the first-layer focused network. According to the positioning area

the image is cropped and dropped

and then the focus network of the next layer is input for training and classification. The single-layer focused network is based on the Inception-V3 network and focuses the effective location area through the convolutional block feature attention module

and location area selection mechanism. Bilinear attention maximum pooling was used to extract the features of each part. Classification prediction is made. Experimental verification was carried out on three commonly used fine-grained data sets CUB-2011

Fgvc-Aircraft and Stanford Cars the accuracy of top-1 was obtained at 89.7%

93.6% and 95.1%

respectively. Experimental results show that the classification accuracy of this model is higher than that of the current mainstream methods.

关键词

Keywords

references

柯圣财 , 赵永威 , 李弼程 , 等 . 基于卷积神经网络和监督核哈希的图像检索方法 [J]. 电子学报 , 2017 , 45 ( 1 ): 157 - 163 .

KE S C , ZHAO Y W , LI B C , et al . Image retrieval based on convolutional neural network and kernel-based supervised hashing [J]. Acta Electronica Sinica , 2017 , 45 ( 1 ): 157 - 163 . (in Chinese)

王泽宇 , 吴艳霞 , 张国印 , 等 . 基于空间结构化推理深度融合网络的RGB-D场景解析 [J]. 电子学报 , 2018 , 46 ( 5 ): 1253 - 1258 .

WANG Z Y , WU Y X , ZHANG G Y , et al . RGB-D scene parsing based on spatial structured inference deep fusion networks [J]. Acta Electronica Sinica , 2018 , 46 ( 5 ): 1253 - 1258 . (in Chinese)

李康 , 李亚敏 , 胡学敏 , 等 . 基于卷积神经网络的鲁棒高精度目标跟踪算法 [J]. 电子学报 , 2018 , 46 ( 9 ): 2087 - 2093 .

LI K , LI Y M , HU X M , et al . A robust and accurate object tracking algorithm based on convolutional neural network [J]. Acta Electronica Sinica , 2018 , 46 ( 9 ): 2087 - 2093 . (in Chinese)

ZHANG N , DONAHUE J , GIRSHICK R , et al . Part-based R-CNNs for fine-grained category detection [C]// European Conference on Computer Vision . Cham, Switzerland : Springer , 2014 : 834 - 849 .

WEI X S , XIE C W , WU J X , et al . Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization [J]. Pattern Recognition , 2018 , 76 : 704 - 714 .

LIN T Y , ROYCHOWDHURY A , MAJI S . Bilinear CNN models for fine-grained visual recognition [C]// 2015 IEEE International Conference on Computer Vision (ICCV) . Santiago, Chile : IEEE , 2015 : 1449 - 1457 .

LIN T Y , MAJI S . Improved bilinear pooling with CNNs [C]// Proceedings of the British Machine Vision Conference 2017 . London, UK : British Machine Vision Association , 2017 : 117.1- 117 . 12 .

葛疏雨 , 高子淋 , 张冰冰 , 等 . 基于核化双线性卷积网络的细粒度图像分类 [J]. 电子学报 , 2019 , 47 ( 10 ): 2134 - 2141 .

GE S Y , GAO Z L , ZHANG B B , et al . Kernelized bilinear CNN models for fine-grained visual recognition [J]. Acta Electronica Sinica , 2019 , 47 ( 10 ): 2134 - 2141 . (in Chinese)

ZHENG H L , FU J L , MEI T , et al . Learning multi-attention convolutional neural network for fine-grained image recognition [C]// 2017 IEEE International Conference on Computer Vision (ICCV) . Venice, Italy : IEEE , 2017 : 5219 - 5227 .

DUBEY A , GUPTA O , GUO P , et al . Pairwise confusion for fine-grained visual classification [C]// European Conference on Computer Vision . Cham, Switzerland : Springer , 2018 : 71 - 88 .

YANG Z , LUO T G , WANG D , et al . Learning to navigate for fine-grained classification [C]// European Conference on Computer Vision . Cham, Switzerland : Springer , 2018 : 438 - 454 .

SZEGEDY C , VANHOUCKE V , IOFFE S , et al . Rethinking the inception architecture for computer vision [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Las Vegas, NV, USA : IEEE , 2016 : 2818 - 2826 .

WOO S , PARK J , LEE J Y , et al . CBAM: Convolutional block attention module [C]// European Conference on Computer Vision . Cham, Switzerland : Springer , 2018 : 3 - 19 .

WEN Y D , ZHANG K P , LI Z F , et al . A discriminative feature learning approach for deep face recognition [C]// European Conference on Computer Vision . Cham, Switzerland : Springer , 2016 : 499 - 515 .

WAH C , BRANSON S , WELINDER P , et al . The Caltech-UCSD Birds-200-2011 Dataset (Technical Re-Port CNS-TR-2011-001) [R]. USA : California Institute of Technology , 2011 .

MAJI S , RAHTU E , KANNALA J , et al . Fine-grained visual classification of aircraft [EB/OL]. [2021] . https://arxiv.org/abs/1306.5151 https://arxiv.org/abs/1306.5151 .

KRAUSE J , STARK M , JIA D , et al . 3D object representations for fine-grained categorization [C]// 2013 IEEE International Conference on Computer Vision Workshops . Sydney, NSW, Australia : IEEE , 2013 : 554 - 561 .

YAN Z C , ZHANG H , PIRAMUTHU R , et al . HD-CNN: Hierarchical deep convolutional neural network for large scale visual recognition [EB/OL]. [2021] . https://arxiv.org/abs/1410.0736 https://arxiv.org/abs/1410.0736 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Las Vegas, NV, USA : IEEE , 2016 : 770 - 778 .

JADERBERG M , SIMONYAN , ZISSERMAN A , et al . Spatial transformer networks [C]// Proceedings of the 28th International Conference on Neural Information Processing Systems . Montreal, Canada : ACM , 2015 : 2017 - 2025 .

ZHANG X P , XIONG H K , ZHOU W G , et al . Picking deep filter responses for fine-grained image recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Las Vegas, NV, USA : IEEE , 2016 : 1134 - 1142 .

FU J L , ZHENG H L , MEI T . Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Honolulu, HI, USA : IEEE , 2017 : 4476 - 4484 .

WEI X , ZHANG Y , GONG Y H , et al . Grassmann pooling as compact homogeneous bilinear pooling for fine-grained visual classification [C]// European Conference on Computer Vision . Cham, Switzerland : Springer , 2018 : 365 - 380 .

SUN M , YUAN Y C , ZHOU F , et al . Multi-attention multi-class constraint for fine-grained image recognition [C]// European Conference on Computer Vision . Cham, Switzerland : Springer , 2018 : 834 - 850 .

WANG Y M , MORARIU V I , DAVIS L S . Learning a discriminative filter bank within a CNN for fine-grained recognition [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City, UT, USA : IEEE , 2018 : 4148 - 4157 .

LI P H , XIE J T , WANG Q L , et al . Towards faster training of global covariance pooling networks by iterative matrix square root normalization [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City, UT, USA : IEEE , 2018 : 947 - 955 .

HU T , QI H G . See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification [EB/OL]. [2021] . https://www.researchgate. net / publication / 330726056_See_Better_Before_Loo - king _ Closer _ Weakly _ Supervised _ Data _ Augmentation _ https://www.researchgate.net/publication/330726056_See_Better_Before_Loo-king_Closer_Weakly_Supervised_Data_Augmentation_

Network_for_Fine-Grained_Visual_Classification .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Neighborhood and Hypergraph Collaboration for Session-Based Recommendation

Object Detection Based on EIMYOLO for High-Resolution Remote Sensing Images

Single-Image High Dynamic Range Reconstruction Based on Multi-Attention and Perceptual Weighted Learning

Facing Different Challenges and Separating Homogeneous and Heterogeneous Information for RGBT Tracking

FD-GAN: Frequency-Decomposed Generative Adversarial Network for Unpaired Underwater Image Enhancement

Related Author

CHEN Rong-yuan

WEN Jie-bin

HUANG Shao-nian

HE Ye-yu

CAO Feng

ZENG Ke-wen

LI De-yu

LUO Xi-zhao

Related Institution

College of Frontier Intersection, Hunan University of Technology and Business

Key Laboratory of Hunan Province for Statistical Learning and Intelligent Computation, Hunan University of Technology and Business

School of Computer Science, Hunan University of Technology and Business

School of Information and Technology, Shanxi University

School of Computer Science and Technology, Soochow University

⁰