电子学报 ›› 2023, Vol. 51 ›› Issue (3): 746-756.DOI: 10.12263/DZXB.20210169
郑云飞1,2,3, 王晓兵1,2, 张雄伟1, 曹铁勇1, 孙蒙1
收稿日期:
2021-01-26
修回日期:
2022-05-15
出版日期:
2023-03-25
通讯作者:
作者简介:
基金资助:
ZHENG Yun-fei1,2,3, WANG Xiao-bing1,2, ZHANG Xiong-wei1, CAO Tie-yong1, SUN Meng1
Received:
2021-01-26
Revised:
2022-05-15
Online:
2023-03-25
Published:
2023-04-20
Corresponding author:
Supported by:
摘要:
知识蒸馏能有效地将教师网络的表征能力迁移到学生网络,无须改变网络结构即可提升网络的性能.因此,在性能优异的目标分割主干网HRNet(High-Resolution Net)中构建自蒸馏学习模型具有重要意义.针对HRNet并行结构中深层与浅层信息充分融合导致直接蒸馏难以实现的挑战,本文提出一种基于多尺度池化金字塔的结构化自蒸馏学习模型:在HRNet分支结构中引入多尺度池化金字塔表示模块,提升网络的知识表示和学习能力;构造“自上而下”和“一致性”两种蒸馏模式;融合交叉熵损失、KL(Kullback-Leibler)散度损失和结构化相似性损失进行自蒸馏学习.在四个包含显著性目标和伪装目标的分割数据集上的实验表明:本文模型在不增加资源开销的前提下,有效提升了网络的目标分割性能.
中图分类号:
郑云飞, 王晓兵, 张雄伟, 等. 基于金字塔知识的自蒸馏HRNet目标分割方法[J]. 电子学报, 2023, 51(3): 746-756.
Yun-fei ZHENG, Xiao-bing WANG, Xiong-wei ZHANG, et al. The Self-Distillation HRNet Object Segmentation Based on the Pyramid Knowledge[J]. Acta Electronica Sinica, 2023, 51(3): 746-756.
模 型 | COD | CPD | DUT-OMRON | PASCAL-S | 性能提升 |
---|---|---|---|---|---|
HRNet18-BL | 63.93 | 63.52 | 81.36 | 82.33 | |
HRNet18-YOT | 64.02(+0.09) | 63.58(+0.06) | 81.41(+0.05) | 82.37(+0.04) | +0.060 |
HRNet48-BL | 70.29 | 71.91 | 84.46 | 85.79 | |
HRNet48-YOT | 70.33(+0.04) | 71.94(+0.03) | 84.49(+0.03) | 85.80(+0.01) | +0.027 |
表1 自蒸馏学习Fβ 值效果对比 (%)
模 型 | COD | CPD | DUT-OMRON | PASCAL-S | 性能提升 |
---|---|---|---|---|---|
HRNet18-BL | 63.93 | 63.52 | 81.36 | 82.33 | |
HRNet18-YOT | 64.02(+0.09) | 63.58(+0.06) | 81.41(+0.05) | 82.37(+0.04) | +0.060 |
HRNet48-BL | 70.29 | 71.91 | 84.46 | 85.79 | |
HRNet48-YOT | 70.33(+0.04) | 71.94(+0.03) | 84.49(+0.03) | 85.80(+0.01) | +0.027 |
图6 原始HRNet与两种自蒸馏学习HRNet输出特征对比图(H1: 原始HRNet,H2:交叉熵+KL散度自蒸馏学习HRNet,H3: 交叉熵+KL散度+结构化相似性自蒸馏学习HRNet,H1/2/3-1: HRNet分支1,H1/2/3-2: HRNet分支2,H1/2/3-3: HRNet分支3,H1/2/3-4: HRNet分支4)
模 型 | COD | CPD | DUT-OMRON | PASCAL-S | 性能提升 |
---|---|---|---|---|---|
HRNet18-BL | 63.93 | 63.52 | 81.36 | 82.33 | |
HRNet18-SDL | 65.67(+1.73) | 66.43(+2.91) | 83.25(+1.64) | 85.29(+2.44) | +2.178 |
HRNet48-BL | 70.29 | 71.91 | 84.46 | 85.79 | |
HRNet48-SDL | 71.66(+1.40) | 73.61(+1.50) | 85.75(+1.35) | 87.08(+1.80) | +1.513 |
表2 自蒸馏学习模型Fβ 值效果对比 (%)
模 型 | COD | CPD | DUT-OMRON | PASCAL-S | 性能提升 |
---|---|---|---|---|---|
HRNet18-BL | 63.93 | 63.52 | 81.36 | 82.33 | |
HRNet18-SDL | 65.67(+1.73) | 66.43(+2.91) | 83.25(+1.64) | 85.29(+2.44) | +2.178 |
HRNet48-BL | 70.29 | 71.91 | 84.46 | 85.79 | |
HRNet48-SDL | 71.66(+1.40) | 73.61(+1.50) | 85.75(+1.35) | 87.08(+1.80) | +1.513 |
模 型 | COD | CPD | DUT-OMRON | PASCAL-S | 性能提升 |
---|---|---|---|---|---|
HRNet18-BL | 63.93 | 63.52 | 81.36 | 82.33 | |
HRNet18-T2D | 65.15(+1.21) | 65.78(+2.26) | 82.70(+1.09) | 83.19(+0.86) | +1.163 |
HRNet18-CST | 64.56(+0.62) | 65.35(+1.83) | 82.27(+0.66) | 83.61(+1.28) | +1.048 |
HRNet18-T2D+CST | 71.66(+1.40) | 73.61(+1.50) | 85.75(+1.35) | 87.08(+1.80) | +1.513 |
HRNet48-BL | 70.26 | 72.11 | 84.40 | 85.28 | |
HRNet48-T2D | 71.02(+0.76) | 73.01(+1.10) | 85.32(+0.92) | 87.00(+1.21) | +0.938 |
HRNet48-CST | 71.24(+0.98) | 72.68(+0.77) | 85.18(+0.78) | 86.35(+1.07) | +0.755 |
HRNet48-T2D+CST | 71.66(+1.40) | 73.61(+1.50) | 85.75(+1.35) | 87.08(+1.80) | +1.513 |
表3 蒸馏模式消融实验Fβ 值效果对比(T2D:自上而下蒸馏模式,CST:一致性蒸馏模式) (%)
模 型 | COD | CPD | DUT-OMRON | PASCAL-S | 性能提升 |
---|---|---|---|---|---|
HRNet18-BL | 63.93 | 63.52 | 81.36 | 82.33 | |
HRNet18-T2D | 65.15(+1.21) | 65.78(+2.26) | 82.70(+1.09) | 83.19(+0.86) | +1.163 |
HRNet18-CST | 64.56(+0.62) | 65.35(+1.83) | 82.27(+0.66) | 83.61(+1.28) | +1.048 |
HRNet18-T2D+CST | 71.66(+1.40) | 73.61(+1.50) | 85.75(+1.35) | 87.08(+1.80) | +1.513 |
HRNet48-BL | 70.26 | 72.11 | 84.40 | 85.28 | |
HRNet48-T2D | 71.02(+0.76) | 73.01(+1.10) | 85.32(+0.92) | 87.00(+1.21) | +0.938 |
HRNet48-CST | 71.24(+0.98) | 72.68(+0.77) | 85.18(+0.78) | 86.35(+1.07) | +0.755 |
HRNet48-T2D+CST | 71.66(+1.40) | 73.61(+1.50) | 85.75(+1.35) | 87.08(+1.80) | +1.513 |
模 型 | COD | CPD | DUT-OMRON | PASCAL-S | 性能提升 |
---|---|---|---|---|---|
HRNet18-BL | 63.94 | 63.52 | 81.61 | 82.85 | |
HRNet18-CE | 64.49(+0.55) | 64.37(+0.85) | 82.44(+0.63) | 83.69(+0.84) | +0.718/32.9 |
HRNet18-CE+KL | 65.12(+1.18) | 66.08(+1.76) | 82.77(+1.16) | 84.83(+1.98) | +1.520/69.8 |
HRNet18-CE+KL+SSIM | 65.67(+1.73) | 66.43(+2.91) | 83.25(+1.64) | 85.29(+2.44) | +2.178 |
HRNet48-BL | 70.26 | 72.11 | 84.40 | 85.28 | |
HRNet48-CE | 70.87(+0.61) | 72.60(+0.49) | 84.82(+0.42) | 85.72(+0.44) | +0.490/32.4 |
HRNet48-CE+KL | 71.21(+0.95) | 73.01(+1.10) | 85.26(+0.86) | 86.57(+1.29) | +1.050/69.4 |
HRNet48-CE+KL+SSIM | 71.66(+1.40) | 73.61(+1.50) | 85.75(+1.35) | 87.08(+1.80) | +1.513 |
表4 学习度量消融实验Fβ 值效果对比(CE:交叉熵损失,KL:KL散度损失,SSIM:结构化相似性损失) (%)
模 型 | COD | CPD | DUT-OMRON | PASCAL-S | 性能提升 |
---|---|---|---|---|---|
HRNet18-BL | 63.94 | 63.52 | 81.61 | 82.85 | |
HRNet18-CE | 64.49(+0.55) | 64.37(+0.85) | 82.44(+0.63) | 83.69(+0.84) | +0.718/32.9 |
HRNet18-CE+KL | 65.12(+1.18) | 66.08(+1.76) | 82.77(+1.16) | 84.83(+1.98) | +1.520/69.8 |
HRNet18-CE+KL+SSIM | 65.67(+1.73) | 66.43(+2.91) | 83.25(+1.64) | 85.29(+2.44) | +2.178 |
HRNet48-BL | 70.26 | 72.11 | 84.40 | 85.28 | |
HRNet48-CE | 70.87(+0.61) | 72.60(+0.49) | 84.82(+0.42) | 85.72(+0.44) | +0.490/32.4 |
HRNet48-CE+KL | 71.21(+0.95) | 73.01(+1.10) | 85.26(+0.86) | 86.57(+1.29) | +1.050/69.4 |
HRNet48-CE+KL+SSIM | 71.66(+1.40) | 73.61(+1.50) | 85.75(+1.35) | 87.08(+1.80) | +1.513 |
1 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 770-778. |
2 | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. |
3 | LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 3431-3440. |
4 | 郑云飞, 张雄伟, 曹铁勇, 等. 基于全卷积网络的语义显著性区域检测方法研究[J]. 电子学报, 2017, 45(11): 2593-2601. |
ZHENG Y F, ZHANG X W, CAO T Y, et al. The semantic salient region detection algorithm based on the fully convolutional networks[J]. Acta Electronica Sinica, 2017, 45(11): 2593-2601. (in Chinese) | |
5 | 李雅倩, 盖成远, 肖存军, 等. 基于细化多尺度深度特征的目标检测网络[J]. 电子学报, 2020, 48(12): 2360-2366. |
LI Y Q, GAI C Y, XIAO C J, et al. Object detection networks based on refined multi-scale depth feature[J]. Acta Electronica Sinica, 2020, 48(12): 2360-2366. (in Chinese) | |
6 | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2261-2269. |
7 | 张锦, 李阳, 任传伦, 等. 基于帧间高级特征差分的跨场景视频前景分割算法[J]. 电子学报, 2021, 49(10): 2032-2040. |
ZHANG J, LI Y, REN C L, et al. Cross-scene foreground segmentation algorithm based on high-level feature differencing between frames[J]. Acta Electronica Sinica, 2021, 49(10): 2032-2040. (in Chinese) | |
8 | 权宇, 李志欣, 张灿龙, 等. 融合深度扩张网络和轻量化网络的目标检测模型[J]. 电子学报, 2020, 48(2): 390-397. |
QUAN Y, LI Z X, ZHANG C L, et al. Fusing deep dilated convolutions network and light-weight network for object detection[J]. Acta Electronica Sinica, 2020, 48(2): 390-397. (in Chinese) | |
9 | ZAGORUYKO S, KOMODAKIS N. Wide residual networks[C]//Proceedings of the British Machine Vision Conference 2016. York: BMVA Press, 2016: 81-87. |
10 | HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. (2015-03-05). . |
11 | ADRIANA R, NICOLAS B, EBRAHIMI K S, et al. Fitnets: Hints for thin deep nets[C]//Proceedings of the European International Conference on Learning Representations. Piscataway: IEEE, 2015: 1-13. |
12 | ZAGORUYKO S, KOMODAKIS N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer[EB/OL].(2016-12-12). . |
13 | ZHANG Y, XIANG T, HOSPEDALES T M, et al. Deep mutual learning[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4320-4328. |
14 | CHEN D F, MEI J P, WANG C, et al. Online knowledge distillation with diverse peers[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 3430-3437. |
15 | ZHANG L F, SONG J B, GAO A N, et al. Be your own teacher: Improve the performance of convolutional neural networks via self distillation[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 3712-3721. |
16 | YANG C L, XIE L X, SU C, et al. Snapshot distillation: teacher-student optimization in one generation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 2859-2868. |
17 | LI D, CHEN Q F. Dynamic hierarchical mimicking towards consistent optimization objectives[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 7642-7651. |
18 | WANG J D, SUN K, CHENG T H, et al. Deep high-resolution representation learning for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3349-3364. |
19 | FAN D P, JI G P, SUN G L, et al. Camouflaged object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 2774-2784. |
20 | ZHENG Y F, ZHANG X W, WANG F, et al. Detection of people with camouflage pattern via dense deconvolution network[J]. IEEE Signal Processing Letters, 2019, 26(1): 29-33. |
21 | FANG Z, ZHANG X W, DENG X T, et al. Camouflage people detection via strong semantic dilation network[C]//Proceedings of the ACM Turing Celebration Conference. New York: ACM, 2019: 1-7. |
22 | YANG C, ZHANG L H, LU H C, et al. Saliency detection via graph-based manifold ranking[C]//2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2013: 3166-3173. |
23 | LI Y, HOU X D, KOCH C, et al. The secrets of salient object segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 280-287. |
24 | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2014-09-04). . |
25 | ZHOU B L, BAU D, OLIVA A, et al. Interpreting deep visual representations via network dissection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9): 2131-2145. |
26 | NOH H, HONG S, HAN B. Learning deconvolution network for semantic segmentation[C]//2015 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2015: 1520-1528. |
27 | RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241. |
28 | BADRINARAYANAN V, KENDALL A, CIPOLLA R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. |
29 | LAN X, ZHU X T, GONG S G. Knowledge distillation by on-the-fly native ensemble[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York: ACM, 2018: 7528-7538. |
30 | SUN D W, YAO A B, ZHOU A J, et al. Deeply-supervised knowledge synergy[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 6997-7006. |
31 | YUAN L, TAY F E, LI G L, et al. Revisiting knowledge distillation via label smoothing regularization[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3902-3910. |
32 | HUANG G, CHEN D L,et al. Multi-scale dense networks for resource efficient image classification[EB/OL]. (2017-03-29). . |
33 | ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6230-6239. |
34 | WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: From error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. |
35 | ACHANTA R, HEMAMI S, ESTRADA F, et al. Frequency-tuned salient region detection[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009: 1597-1604. |
[1] | 李豪, 袁广林, 秦晓燕, 琚长瑞, 朱虹. 基于空间加权对数似然比相关滤波与Deep Snake的目标轮廓跟踪[J]. 电子学报, 2023, 51(1): 105-116. |
[2] | 谢青松, 刘晓庆, 安志勇, 李博. 基于前景优化的视觉目标跟踪算法[J]. 电子学报, 2022, 50(7): 1558-1566. |
[3] | 付利华, 赵宇, 姜涵煦, 赵茹, 吴会贤, 闫绍兴. 基于前景感知视觉注意的半监督视频目标分割[J]. 电子学报, 2022, 50(1): 195-206. |
[4] | 付利华, 赵宇, 孙晓威, 卢中山, 王丹, 杨寒雪. 基于孪生网络的快速视频目标分割[J]. 电子学报, 2020, 48(4): 625-630. |
[5] | 吴 巍;彭嘉雄;刘 泉. 对红外序列图像中小目标分割的研究[J]. 电子学报, 2004, 32(7): 1116-1119. |
[6] | 朱仲杰;蒋刚毅;郁 梅;王让定;吴训威;. 目标基视频编码中的运动目标提取与跟踪新算法[J]. 电子学报, 2003, 31(9): 1426-1428. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||