基于坐标重要性池化和解耦类别对齐蒸馏的图像分类算法

刘颖; 薛家昊; 张伟东; 许志杰

doi:10.12263/DZXB.20240754

您当前的位置：

首页 >

文章列表页 >

基于坐标重要性池化和解耦类别对齐蒸馏的图像分类算法

学术论文 | 更新时间：2025-12-08

- 基于坐标重要性池化和解耦类别对齐蒸馏的图像分类算法
- Image Classification Algorithm Based on Coordinate Importance Pooling and Decoupled Class Alignment Distillation
- 电子学报 2025年53卷第3期页码：962-973
- 作者机构：
  
  1.西安邮电大学图像与信息处理研究所，陕西西安 710121
  2.无线通信与信息处理技术国际联合研究中心，陕西西安 710121
  3.英国哈德斯菲尔德大学，西约克郡 HD13DH
- 作者简介：
  
  [ "刘颖女，西安邮电大学通信与信息工程学院教授.主要研究方向为图像处理与模式识别. E-mail: liuying_ciip@163.com" ]
  [ "薛家昊男，西安邮电大学通信与信息工程学院硕士研究生.主要研究方向为图像分类. E-mail: xuejiahao0803@163.com" ]
  [ "张伟东男，西安邮电大学通信与信息工程学院副教授.主要研究方向为室内场景理解. E-mail: chluzhre@126.com" ]
  [ "许志杰男，英国哈德斯菲尔德大学（University of Huddersfield）工程与计算机学院教授.主要研究方向为图形图像处理. E-mail: z.xu@hud.ac.uk" ]
- 基金信息：
  
  国家自然科学基金(62106195)
- DOI：10.12263/DZXB.20240754
  中图分类号： TP391;
- 收稿：2024-08-13，
  
  修回：2025-01-13，
  
  纸质出版：2025-03-25
- 稿件说明：
移动端阅览
刘颖, 薛家昊, 张伟东, 等. 基于坐标重要性池化和解耦类别对齐蒸馏的图像分类算法[J]. 电子学报, 2025, 53(03): 962-973.

LIU Ying, XUE Jia-hao, ZHANG Wei-dong, et al. Image Classification Algorithm Based on Coordinate Importance Pooling and Decoupled Class Alignment Distillation[J]. Acta Electronica Sinica, 2025, 53(03): 962-973.
刘颖, 薛家昊, 张伟东, 等. 基于坐标重要性池化和解耦类别对齐蒸馏的图像分类算法[J]. 电子学报, 2025, 53(03): 962-973. DOI：10.12263/DZXB.20240754

LIU Ying, XUE Jia-hao, ZHANG Wei-dong, et al. Image Classification Algorithm Based on Coordinate Importance Pooling and Decoupled Class Alignment Distillation[J]. Acta Electronica Sinica, 2025, 53(03): 962-973. DOI：10.12263/DZXB.20240754

摘要

为提高卷积神经网络图像分类精度的同时实现网络轻量化，本文提出一种基于坐标重要性池化和解耦类别对齐蒸馏的图像分类算法.首先，设计一种坐标重要性池化模块并将其嵌入ResNet34，充分利用图像像素的位置信息，以增强其判别重要性特征的能力；其次，采用BlurPool缓解在下采样过程中移位等变性丢失对网络性能的影响，以此构建教师网络；最后，构造一种解耦类别对齐蒸馏算法，分别考虑目标类和非目标类的知识并引入类别之间的关联信息，以高效地将分类知识从教师网络迁移到轻量级MobileNetV3学生网络.在不同数据集上的实验结果表明，本文提出的教师网络有效提高了分类性能，且蒸馏训练后的学生网络明显优于其他同量级网络，实现了更优越的综合性能，能够更好地应用于计算和内存资源受限的实际场景.

Abstract

An image classification algorithm based on coordinate importance pooling and decoupled class alignment distillation is proposed to improve the image classification accuracy of convolutional neural networks while achieving network lightweighting. Firstly

a coordinate importance pooling module is designed and embedded it into ResNet34

in order to fully utilize the positional information of image pixels to enhance the ability to discriminate important features. Secondly

BlurPool is used to mitigate the impact on network performance due to shift equivariance during down-sampling

and to construct the teacher network. Finally

the decoupled class alignment distillation algorithm was constructed to efficiently migrate image classification knowledge from the teacher network to the lightweight MobileNetV3 network

which considers the knowledge of target and non-target class separately and introduces correlation information between the class. The experimental results on different datasets showed that the proposed teacher network effectively improves the classification performance

and the distillation-trained student network achieves superior overall performance than other networks of the same magnitude

making it better applicable to practical scenarios with limited computational and storage power.

关键词

Keywords

references

LECUN Y , BENGIO Y , HINTON G . Deep learning [J ] . Nature , 2015 , 521 ( 7553 ): 436 - 444 .

WANG S X , VELDHUIS R , BRUNE C , et al . What do neural networks learn in image classification? A frequency shortcut perspective [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 1433 - 1442 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .

葛道辉 , 李洪升 , 张亮 , 等 . 轻量级神经网络架构综述 [J ] . 软件学报 , 2020 , 31 ( 9 ): 2627 - 2653 .

GE D H , LI H S , ZHANG L , et al . Survey of lightweight neural network [J ] . Journal of Software , 2020 , 31 ( 9 ): 2627 - 2653 . (in Chinese)

IANDOLA F N , HAN S , MOSKEWICZ M W , et al . SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size [EB/OL ] . ( 2016-02-24 )[ 2024-08-13 ] . https://arxiv.org/abs/1602.07360v4 https://arxiv.org/abs/1602.07360v4 .

MA N N , ZHANG X Y , ZHENG H T , et al . ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design [M ] // Computer Vision-ECCV 2018 . Cham : Springer International Publishing , 2018 : 122 - 138 .

HOWARD A G , ZHU M L , CHEN B , et al . MobileNets: Efficient convolutional neural networks for mobile vision applications [EB/OL ] . ( 2017-04-17 )[ 2024-08-13 ] . https://arxiv.org/ abs/1704.04861v1 https://arxiv.org/abs/1704.04861v1 .

SANDLER M , HOWARD A , ZHU M L , et al . MobileNetV2: Inverted residuals and linear bottlenecks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4510 - 4520 .

HOWARD A , SANDLER M , CHEN B , et al . Searching for MobileNetV3 [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 1314 - 1324 .

SHEN H , WANG Z W , ZHANG J X , et al . L-Net: A lightweight convolutional neural network for devices with low computing power [J ] . Information Sciences , 2024 , 660 : 120131 .

TAN M X , CHEN B , PANG R M , et al . MnasNet: Platform-aware neural architecture search for mobile [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 2820 - 2828 .

TAN M X , LE Q V . EfficientNet: Rethinking model scaling for convolutional neural networks [EB/OL ] . ( 2020-09-11 )[ 2024-08-13 ] . https://arxiv.org/abs/1905.11946v5 https://arxiv.org/abs/1905.11946v5 .

PENG C , LI Y Y , SHANG R H , et al . ReCNAS: Resource-constrained neural architecture search based on differentiable annealing and dynamic pruning [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2024 , 35 ( 2 ): 2805 - 2819 .

VADERA S , AMEEN S . Methods for pruning deep neural networks [J ] . IEEE Access , 2022 , 10 : 63280 - 63300 .

ROKH B , AZARPEYVAND A , KHANTEYMOORI A . A comprehensive survey on model quantization for deep neural networks in image classification [EB/OL ] . ( 2023-10-23 )[ 2024-08-13 ] . https://arxiv.org/abs/2205.07877v5 https://arxiv.org/abs/2205.07877v5 .

SAINATH T N , KINGSBURY B , SINDHWANI V , et al . Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets [C ] // 2013 IEEE International Conference on Acoustics, Speech and Signal Processing . Piscataway : IEEE , 2013 : 6655 - 6659 .

黄震华 , 杨顺志 , 林威 , 等 . 知识蒸馏研究综述 [J ] . 计算机学报 , 2022 , 45 ( 3 ): 624 - 653 .

HUANG Z H , YANG S Z , LIN W , et al . Knowledge distillation: A survey [J ] . Chinese Journal of Computers , 2022 , 45 ( 3 ): 624 - 653 . (in Chinese)

BUCILUǍ C , CARUANA R , NICULESCU-MIZIL A . Model compression [C ] // Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . New York : ACM , 2006 : 535 - 541 .

HINTON G , VINYALS O , DEAN J . Distilling the knowledge in a neural network [EB/OL ] . ( 2015-03-09 )[ 2024-08-13 ] . https://arxiv.org/abs/1503.02531v1 https://arxiv.org/abs/1503.02531v1 .

刘立波 , 郗思宇 , 邓箴 . 结合改进ConvNeXt网络与知识蒸馏的天气识别 [J ] . 光学精密工程 , 2023 , 31 ( 14 ): 2123 - 2134 .

LIU L B , XI S Y , DENG Z . Weather recognition combining improved ConvNeXt models with knowledge distillation [J ] . Optics and Precision Engineering , 2023 , 31 ( 14 ): 2123 - 2134 . (in Chinese)

李大湘 , 南艺璇 , 刘颖 . 面向遥感图像场景分类的双知识蒸馏模型 [J ] . 电子与信息学报 , 2023 , 45 ( 10 ): 3558 - 3567 .

LI D X , NAN Y X , LIU Y . A double knowledge distillation model for remote sensing image scene classification [J ] . Journal of Electronics & Information Technology , 2023 , 45 ( 10 ): 3558 - 3567 . (in Chinese)

LI S H , SHAO M W , GUO Z H , et al . Improving knowledge distillation via pseudo-multi-teacher network [J ] . Machine Vision and Applications , 2023 , 34 ( 2 ): 33 .

MIRZADEH S I , FARAJTABAR M , LI A , et al . Improved knowledge distillation via teacher assistant [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 4 ): 5191 - 5198 .

HUANG T , YOU S , WANG F , et al . Knowledge distillation from a stronger teacher [C ] // Proceedings of the 36th International Conference on Neural Information Processing Systems . New York : ACM , 2022 : 33716 - 33727 .

KIM J , JUNG J , KANG U . Compressing deep graph convolution network with multi-staged knowledge distillation [J ] . PLoS One , 2021 , 16 ( 8 ): e0256187 .

LI C X , LIN M B , DING Z Y , et al . Knowledge Condensation Distillation [M ] // Computer Vision-ECCV 2022 . Cham : Springer Nature Switzerland , 2022 : 19 - 35 .

SHARMA S , LODHI S S , CHANDRA J . SCL-IKD: Intermediate knowledge distillation via supervised contrastive representation learning [J ] . Applied Intelligence , 2023 , 53 ( 23 ): 28520 - 28541 .

ZHANG J , TAO Z , GUO K H , et al . Hybrid mix-up contrastive knowledge distillation [J ] . Information Sciences , 2024 , 660 : 120107 .

GHOLAMALINEZHAD H , KHOSRAVI H . Pooling methods in deep neural networks, a review [EB/OL ] . ( 2020-09-16 )[ 2024-08-13 ] . https://arxiv.org/abs/2009.07485v1 https://arxiv.org/abs/2009.07485v1 .

HE K M , ZHANG X Y , REN S Q , et al . Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition [M ] // Computer Vision-ECCV 2014 . Cham : Springer International Publishing , 2014 : 346 - 361 .

XU L J , YAN S Z , CHEN X , et al . Motion recognition algorithm based on deep edge-aware pyramid pooling network in human-computer interaction [J ] . IEEE Access , 2019 , 7 : 163806 - 163813 .

WIJAYA K T , PAEK D H , KONG S H . Advanced feature learning on point clouds using multi-resolution features and learnable pooling [J ] . Remote Sensing , 2024 , 16 ( 11 ): 1835 .

ZHAO L , ZHANG Z L . A improved pooling method for convolutional neural networks [J ] . Scientific Reports , 2024 , 14 ( 1 ): 1589 .

GAO Z T , WANG L M , WU G S . LIP: Local importance-based pooling [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 3355 - 3364 .

WANG L , GAO C Q , LIU J , et al . A novel learning-based frame pooling method for event detection [J ] . Signal Processing , 2017 , 140 : 45 - 52 .

ZHANG R . Making convolutional networks shift-invariant again [C ] // 36th International Conference on Machine Learning . New York : PMLR , 2019 : 12712 - 12722 .

LI F F , FERGUS R , PERONA P . One-shot learning of object categories [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2006 , 28 ( 4 ): 594 - 611 .

GPIOSENKA G . BIRDS 525 SPECIES [EB/OL ] . ( 2023-04-20 )[ 2024-08-13 ] . https://www.kaggle.com/datasets/gpiosenka/100-bird-species https://www.kaggle.com/datasets/gpiosenka/100-bird-species .

刘颖 , 胡丹 , 范九伦 . 现勘图像检索综述 [J ] . 电子学报 , 2018 , 46 ( 3 ): 761 - 768 .

LIU Y , HU D , FAN J L . A survey of crime scene investigation image retrieval [J ] . Acta Electronica Sinica , 2018 , 46 ( 3 ): 761 - 768 . (in Chinese)

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [EB/OL ] . ( 2015-08-10 )[ 2024-08-13 ] . https://arxiv.org/abs/1409.1556v6 https://arxiv.org/abs/1409.1556v6 .

LI J F , WEN Y , HE L H . SCConv: Spatial and channel reconstruction convolution for feature redundancy [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 6153 - 6162 .

WANG R J , LI X , LING C X . Pelee: A real-time object detection system on mobile devices [C ] // NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems . New York : ACM , 2018 : 1967 - 1976 .

HAN K , WANG Y H , TIAN Q , et al . GhostNet: More features from cheap operations [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 1580 - 1589 .

CUI C , GAO T Q , WEI S Y , et al . PP-LCNet: A lightweight CPU convolutional neural network [EB/OL ] . ( 2021-09-17 )[ 2024-08-13 ] . https://arxiv.org/abs/2109.15099v1 https://arxiv.org/abs/2109.15099v1 .

VASU P K A , GABRIEL J , ZHU J , et al . MobileOne: An improved one millisecond mobile backbone [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 7907 - 7917 .

WOO S , DEBNATH S , HU R H , et al . ConvNeXt V2: Co-designing and scaling ConvNets with masked autoencoders [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 16133 - 16142 .

MEHTA S , RASTEGARI M . MobileVIT: Light-weight, general-purpose, and mobile-friendly vision transformer [EB/OL ] . ( 2021-10-05 )[ 2024-08-13 ] . https://arxiv.org/abs/2110.02178v2 https://arxiv.org/abs/2110.02178v2 .

WANG A , CHEN H , LIN Z J , et al . Rep ViT: Revisiting mobile CNN from ViT perspective [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2024 : 15909 - 15920 .

SHI D . TransNeXt: Robust foveal visual perception for vision transformers [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2024 : 17773 - 17783 .

ZHAO B R , CUI Q , SONG R J , et al . Decoupled knowledge distillation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 11943 - 11952 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于图表征知识蒸馏的图像分类方法

基于特征膨胀卷积模块的轻量化技术研究