The Self-Distillation HRNet Object Segmentation Based on the Pyramid Knowledge

ZHENG Yun-fei; WANG Xiao-bing; ZHANG Xiong-wei; CAO Tie-yong; SUN Meng

doi:10.12263/DZXB.20210169

您当前的位置：

首页 >

文章列表页 >

The Self-Distillation HRNet Object Segmentation Based on the Pyramid Knowledge

PAPERS | 更新时间：2025-12-08

- The Self-Distillation HRNet Object Segmentation Based on the Pyramid Knowledge
- ACTA ELECTRONICA SINICA Vol. 51, Issue 3, Pages: 746-756(2023)
- 作者机构：
  
  1.陆军工程大学指挥控制工程学院,江苏南京 210007
  2.陆军炮兵防空兵学院,安徽合肥 230031
  3.安徽省偏振成像与探测重点实验室,安徽合肥 230031
- 作者简介：
- 基金信息：
  
  Foundation Item(s):　National Natural Science Foundation of China(61801512;62071484);Natural Science Foundation of Jiangsu Province(BK20180080)
- DOI：10.12263/DZXB.20210169
  CLC： TP391;TP183
- Received：26 January 2021，
  
  Revised：2022-05-15，
  
  Published：25 March 2023
- 稿件说明：
移动端阅览
郑云飞,王晓兵,张雄伟等.基于金字塔知识的自蒸馏HRNet目标分割方法[J].电子学报,2023,51(03):746-756.

ZHENG Yun-fei,WANG Xiao-bing,ZHANG Xiong-wei,et al.The Self-Distillation HRNet Object Segmentation Based on the Pyramid Knowledge[J].ACTA ELECTRONICA SINICA,2023,51(03):746-756.
郑云飞,王晓兵,张雄伟等.基于金字塔知识的自蒸馏HRNet目标分割方法[J].电子学报,2023,51(03):746-756. DOI： 10.12263/DZXB.20210169.

ZHENG Yun-fei,WANG Xiao-bing,ZHANG Xiong-wei,et al.The Self-Distillation HRNet Object Segmentation Based on the Pyramid Knowledge[J].ACTA ELECTRONICA SINICA,2023,51(03):746-756. DOI： 10.12263/DZXB.20210169.

摘要

知识蒸馏能有效地将教师网络的表征能力迁移到学生网络，无须改变网络结构即可提升网络的性能.因此，在性能优异的目标分割主干网HRNet（High-Resolution Net）中构建自蒸馏学习模型具有重要意义.针对HRNet并行结构中深层与浅层信息充分融合导致直接蒸馏难以实现的挑战，本文提出一种基于多尺度池化金字塔的结构化自蒸馏学习模型：在HRNet分支结构中引入多尺度池化金字塔表示模块，提升网络的知识表示和学习能力；构造“自上而下”和“一致性”两种蒸馏模式；融合交叉熵损失、KL（Kullback-Leibler）散度损失和结构化相似性损失进行自蒸馏学习.在四个包含显著性目标和伪装目标的分割数据集上的实验表明：本文模型在不增加资源开销的前提下，有效提升了网络的目标分割性能.

Abstract

The knowledge distillation can effectively transfer the representation ability of a teacher network to a student network

and improve the performance of the network without changing the network structure. Therefore

it is of great significance to construct a self-distillation learning model in the backbone network of the HRNet (High-Resolution Net)with an excellent performance in the object segmentation tasks. Aiming to the challenge that parallel integration architecture of deep and shallow information in HRNet makes direct distillation difficult to achieve

a structured self-distillation learning framework based on multi-scale pooling pyramid is proposed in this paper. Firstly

the multiscale pooling pyramid feature modules are introduced into the branch structure in the HRNet to improve knowledge representation and learning ability. Secondly

the top-down and consistency distillation modes are constructed. Meanwhile the cross entropy loss

KL (Kullback-Leibler)divergence loss and structural similarity loss are combined for the self-distillation learning framework. The experiments on four segmentation datasets including saliency and camouflaged objects demonstrate that the proposed model improves the performance of the object segmentation of the network without increasing resource costs.

关键词

Keywords

references

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .

REN S Q , HE K M , GIRSHICK R , et al . Faster R-CNN: Towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 .

LONG J , SHELHAMER E , DARRELL T . Fully convolutional networks for semantic segmentation [C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2015 : 3431 - 3440 .

郑云飞 , 张雄伟 , 曹铁勇 , 等 . 基于全卷积网络的语义显著性区域检测方法研究 [J]. 电子学报 , 2017 , 45 ( 11 ): 2593 - 2601 .

ZHENG Y F , ZHANG X W , CAO T Y , et al . The semantic salient region detection algorithm based on the fully convolutional networks [J]. Acta Electronica Sinica , 2017 , 45 ( 11 ): 2593 - 2601 . (in Chinese)

李雅倩 , 盖成远 , 肖存军 , 等 . 基于细化多尺度深度特征的目标检测网络 [J]. 电子学报 , 2020 , 48 ( 12 ): 2360 - 2366 .

LI Y Q , GAI C Y , XIAO C J , et al . Object detection networks based on refined multi-scale depth feature [J]. Acta Electronica Sinica , 2020 , 48 ( 12 ): 2360 - 2366 . (in Chinese)

HUANG G , LIU Z , VAN DER MAATEN L , et al . Densely connected convolutional networks [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2017 : 2261 - 2269 .

张锦 , 李阳 , 任传伦 , 等 . 基于帧间高级特征差分的跨场景视频前景分割算法 [J]. 电子学报 , 2021 , 49 ( 10 ): 2032 - 2040 .

ZHANG J , LI Y , REN C L , et al . Cross-scene foreground segmentation algorithm based on high-level feature differencing between frames [J]. Acta Electronica Sinica , 2021 , 49 ( 10 ): 2032 - 2040 . (in Chinese)

权宇 , 李志欣 , 张灿龙 , 等 . 融合深度扩张网络和轻量化网络的目标检测模型 [J]. 电子学报 , 2020 , 48 ( 2 ): 390 - 397 .

QUAN Y , LI Z X , ZHANG C L , et al . Fusing deep dilated convolutions network and light-weight network for object detection [J]. Acta Electronica Sinica , 2020 , 48 ( 2 ): 390 - 397 . (in Chinese)

ZAGORUYKO S , KOMODAKIS N . Wide residual networks [C]// Proceedings of the British Machine Vision Conference 2016 . York : BMVA Press , 2016 : 81 - 87 .

HINTON G , VINYALS O , DEAN J . Distilling the knowledge in a neural network [EB/OL]. ( 2015-03-05 ). https://arxiv.org/abs/1503.02531 https://arxiv.org/abs/1503.02531 .

ADRIANA R , NICOLAS B , EBRAHIMI K S , et al . Fitnets: Hints for thin deep nets [C]// Proceedings of the European International Conference on Learning Representations . Piscataway : IEEE , 2015 : 1 - 13 .

ZAGORUYKO S , KOMODAKIS N . Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer [EB/OL].( 2016-12-12 ). https://arxiv.org/abs/1612.03928 https://arxiv.org/abs/1612.03928 .

ZHANG Y , XIANG T , HOSPEDALES T M , et al . Deep mutual learning [C]// 2018 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4320 - 4328 .

CHEN D F , MEI J P , WANG C , et al . Online knowledge distillation with diverse peers [J]. Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 4 ): 3430 - 3437 .

ZHANG L F , SONG J B , GAO A N , et al . Be your own teacher: Improve the performance of convolutional neural networks via self distillation [C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 3712 - 3721 .

YANG C L , XIE L X , SU C , et al . Snapshot distillation: teacher-student optimization in one generation [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 2859 - 2868 .

LI D , CHEN Q F . Dynamic hierarchical mimicking towards consistent optimization objectives [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 7642 - 7651 .

WANG J D , SUN K , CHENG T H , et al . Deep high-resolution representation learning for visual recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 43 ( 10 ): 3349 - 3364 .

FAN D P , JI G P , SUN G L , et al . Camouflaged object detection [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 2774 - 2784 .

ZHENG Y F , ZHANG X W , WANG F , et al . Detection of people with camouflage pattern via dense deconvolution network [J]. IEEE Signal Processing Letters , 2019 , 26 ( 1 ): 29 - 33 .

FANG Z , ZHANG X W , DENG X T , et al . Camouflage people detection via strong semantic dilation network [C]// Proceedings of the ACM Turing Celebration Conference . New York : ACM , 2019 : 1 - 7 .

YANG C , ZHANG L H , LU H C , et al . Saliency detection via graph-based manifold ranking [C]// 2013 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2013 : 3166 - 3173 .

LI Y , HOU X D , KOCH C , et al . The secrets of salient object segmentation [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2014 : 280 - 287 .

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [EB/OL]. ( 2014-09-04 ). https://arXiv.org/abs/1409.1556 https://arXiv.org/abs/1409.1556 .

ZHOU B L , BAU D , OLIVA A , et al . Interpreting deep visual representations via network dissection [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2019 , 41 ( 9 ): 2131 - 2145 .

NOH H , HONG S , HAN B . Learning deconvolution network for semantic segmentation [C]// 2015 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2015 : 1520 - 1528 .

RONNEBERGER O , FISCHER P , BROX T . U-Net: Convolutional networks for biomedical image segmentation [C]// International Conference on Medical Image Computing and Computer-Assisted Intervention . Cham : Springer , 2015 : 234 - 241 .

BADRINARAYANAN V , KENDALL A , CIPOLLA R . Segnet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 12 ): 2481 - 2495 .

LAN X , ZHU X T , GONG S G . Knowledge distillation by on-the-fly native ensemble [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems . New York : ACM , 2018 : 7528 - 7538 .

SUN D W , YAO A B , ZHOU A J , et al . Deeply-supervised knowledge synergy [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 6997 - 7006 .

YUAN L , TAY F E , LI G L , et al . Revisiting knowledge distillation via label smoothing regularization [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 3902 - 3910 .

HUANG G , CHEN D L , et al . Multi-scale dense networks for resource efficient image classification [EB/OL]. ( 2017-03-29 ). https://arXiv.org/abs/1703.09844 https://arXiv.org/abs/1703.09844 .

ZHAO H S , SHI J P , QI X J , et al . Pyramid scene parsing network [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2017 : 6230 - 6239 .

WANG Z , BOVIK A C , SHEIKH H R , et al . Image quality assessment: From error visibility to structural similarity [J]. IEEE Transactions on Image Processing , 2004 , 13 ( 4 ): 600 - 612 .

ACHANTA R , HEMAMI S , ESTRADA F , et al . Frequency-tuned salient region detection [C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 1597 - 1604 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Visual Object Tracking Algorithm Based on Foreground Optimization

Related Author

CAO Tie-yong

XIE Qing-song

LIU Xiao-qing

AN Zhi-yong

LI Bo

Xiao-qing LIU

Bo LI

Zhi-yong AN

Related Institution

School of Computer Science and Technology, Shandong Technology and Business University

School of Information and Electronic Engineering, Shandong Technology and Business University

School of Computer Science and Technology， Shandong Technology and Business University

School of Information and Electronic Engineering， Shandong Technology and Business University

⁰