异构模型多层次蒸馏的红外-可见光图像融合

张棋; 宋红; 李金夫; 马士瀚; 林毓聪; 杨健

doi:10.12263/DZXB.20250764

您当前的位置：

首页 >

文章列表页 >

异构模型多层次蒸馏的红外-可见光图像融合

中国电子学会科学技术奖特约专栏 | 更新时间：2026-04-24

- 异构模型多层次蒸馏的红外-可见光图像融合
- Infrared-Visible Image Fusion via Heterogeneous Multi-Level Distillation
- 电子学报 2025年53卷第12期页码：4250-4266
- 作者机构：
  
  1.北京理工大学计算机学院，北京 100081
  2.北京理工大学光电学院，北京 100081
- 作者简介：
  
  [ "张棋男，2004年3月出生于辽宁省铁岭市.现为北京理工大学计算机学院硕士研究生.主要研究方向为计算机视觉. E-mail: zq@bit.edu.cn" ]
  [ "宋红女，1977年10月出生于陕西省西安市.现为北京理工大学计算机学院教授、博士生导师.获中国电子学会科技进步奖一等奖、吴文俊人工智能科技进步奖一等奖等奖项6项.在国内外发表学术论文100余篇.主要研究方向为计算机视觉.E-mail: songhong@bit.edu.cn" ]
  [ "李金夫男，1990年8月出生于湖北省咸宁市.现为北京理工大学博士后.国内外发表学术论文10余篇，主持国家重点研发计划子课题、北京市自然科学基金、四川省科技支撑计划等国家/省部级项目.主要研究方向为多模态图像融合与目标检测.E-mail: jinfuli@bit.edu.cn" ]
  [ "马士瀚男，1998年12月出生于山东省枣庄市.现为北京理工大学计算机学院博士研究生.主要研究方向为计算机视觉.E-mail: mashihan@bit.edu.cn" ]
  [ "林毓聪男，1993年12月出生于广西壮族自治区南宁市.现为北京理工大学光电学院特聘副研究员.国内外发表学术论文10余篇，牵头承担国家自然科学基金青年科学基金项目，作为项目骨干参与多项国家级项目.主要研究方向为多模态医学数据智能分析.E-mail: linyucongbit@bit.edu.cn" ]
  [ "杨健男，1977年10月出生于云南省楚雄州.现为北京理工大学光电学院教授、博士生导师.获国家技术发明奖二等奖、教育部技术发明奖一等奖等省部级以上科研奖励20余项.国内外发表学术论文300余篇.主要研究方向为计算机视觉.中国电子学会会员编号：E190013149S.E-mail: jyang@bit.edu.cn" ]
- 基金信息：
  
  北京市自然科学基金(L242024);国家自然科学基金(U22A2052)
- DOI：10.12263/DZXB.20250764
  中图分类号： TP391.4;TH701
- 收稿：2025-09-04，
  
  录用：2025-12-08，
  
  纸质出版：2025-12-25
- 稿件说明：
移动端阅览
张棋, 宋红, 李金夫, 等. 异构模型多层次蒸馏的红外-可见光图像融合[J]. 电子学报, 2025, 53(12): 4250-4266.

ZHANG Qi, SONG Hong, LI Jin-fu, et al. Infrared-Visible Image Fusion via Heterogeneous Multi-Level Distillation[J]. Acta Electronica Sinica, 2025, 53(12): 4250-4266.
张棋, 宋红, 李金夫, 等. 异构模型多层次蒸馏的红外-可见光图像融合[J]. 电子学报, 2025, 53(12): 4250-4266. DOI：10.12263/DZXB.20250764

ZHANG Qi, SONG Hong, LI Jin-fu, et al. Infrared-Visible Image Fusion via Heterogeneous Multi-Level Distillation[J]. Acta Electronica Sinica, 2025, 53(12): 4250-4266. DOI：10.12263/DZXB.20250764

摘要

知识蒸馏可将复杂教师网络的表征能力迁移至轻量学生网络，有效提升模型性能与部署效率.然而，现有基于知识蒸馏的多模态图像融合方法常忽视师生网络的特征表示、模态偏好异构性及多模态图像的固有差异，导致知识传递低效、语义对齐不足及融合性能退化.针对上述问题，本文提出基于异构模型多层次知识蒸馏的红外与可见光图像融合方法，创新性设计跨层级知识传递机制，在特征层通过注意力引导红外显著性目标与可见光纹理的精准迁移；在关系层通过相似性关系匹配与拓扑结构对齐优化跨模态语义适配；在输出层通过响应约束确保融合结果的视觉一致性与语义完整性，缓解了师生网络模态偏好不匹配导致的信息失衡.此外，构建适配任务特性的轻量化CNN-Transformer双分支学生网络，兼顾全局信息建模与局部细节感知，增强对异构知识的接收整合能力.在MSRS、RoadScene、TNO和M3FD数据集上的实验结果表明，所提方法在三种结构差异显著的教师模型的指导下，互相关系数（Correlation Coefficient，CC）、峰值信噪比（Peak Signal-to-Noise Ratio，PSNR）、空间频率保持度（Sum of the Correlations of Differences，SCD）和结构相似性指数（Structural Similarity Index Measure，SSIM）四项指标均优于教师模型及现有方法，且模型参数量仅为0.077 2 M，服务器上推理时间仅为31.22 ms，在提升融合性能与蒸馏鲁棒性的同时，实现了融合网络的轻量化与实时性；同时模型在Jetson AGX Xavier边缘平台上推理时间仅为250.31 ms，具备良好的边缘部署能力与实际应用价值.

Abstract

Knowledge distillation transfers the representation capability of a complex teacher network to a lightweight student network

thereby enhancing model performance and deployment efficiency. However

existing knowledge distillation-based multimodal image fusion methods often neglect the heterogeneity of feature representations and modality preferences between teacher and student networks

as well as the inherent differences across modalities. This limitation results in inefficient knowledge transfer

insufficient semantic alignment

and degraded fusion performance. To address these issues

we propose an infrared and visible image fusion method based on heterogeneous model multi-level knowledge distillation. Specifically

a cross-layer knowledge transfer mechanism is designed: at the feature layer

attention is utilized to guide the precise transfer of infrared salient targets and visible-light textures; at the relationship layer

similarity-based relational matching and topological structure alignment are employed to enhance cross-modal semantic adaptation; and at the output layer

response constraints are applied to ensure both visual consistency and semantic integrity of the fused results

alleviating the information imbalance caused by mismatched modality preferences between teacher and student networks. In addition

we construct a task-adaptive lightweight CNN-Transformer dual-branch student network that simultaneously models global information and captures local details

thereby enhancing its ability to receive and integrate heterogeneous knowledge. Experimental results on the MSRS

RoadScene

TNO

and M3FD datasets demonstrate that under the guidance of three teacher models with significantly different architectures

the proposed method outperforms both the teacher models and state-of-the-art approaches in terms of correlation coefficient (CC)

peak signal-to-noise ratio (PSNR)

sum of the correlations of differences (SCD) and structural similarity index measure (SSIM) metrics

while requiring only 0.077 2 M parameters and achieving 31.22 ms inference time on a server platform. Moreover

the model maintains an inference time of 250.31 ms on the Jetson AGX Xavier edge platform

indicating strong suitability for edge deployment and practical applications.

关键词

Keywords

references

MA J Y , MA Y , LI C . Infrared and visible image fusion methods and applications: A survey [J ] . Information Fusion , 2019 , 45 : 153 - 178 .

周非 , 舒浩峰 , 白梦林 , 等 . 生成对抗网络协同角度异构中心三元组损失的跨模态行人重识别 [J ] . 电子学报 , 2023 , 51 ( 7 ): 1803 - 1811 .

ZHOU F , SHU H F , BAI M L , et al . Cross-modal person re-identification based on generative adversarial network coordinated with angle based heterogeneous center triplet loss [J ] . Acta Electronica Sinica , 2023 , 51 ( 7 ): 1803 - 1811 . (in Chinese)

汪进中 , 戴顺 , 张秀伟 , 等 . 无人机视角多源目标检测数据集UAV-RGBT及算法基准 [J ] . 电子学报 , 2025 , 53 ( 3 ): 686 - 704 .

WANG J Z , DAI S , ZHANG X W , et al . UAV-RGBT multispectral object detection dataset and algorithm benchmark [J ] . Acta Electronica Sinica , 2025 , 53 ( 3 ): 686 - 704 . (in Chinese)

LIU J Y , LIU Z , WU G Y , et al . Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation [C ] // 2023 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2024 : 8081 - 8090 .

JIN X , JIANG Q , YAO S W , et al . A survey of infrared and visual image fusion methods [J ] . Infrared Physics & Technology , 2017 , 85 : 478 - 501 .

BHATARIA K C , SHAH B K . A review of image fusion techniques [C ] // 2018 Second International Conference on Computing Methodologies and Communication . Piscataway : IEEE , 2018 : 114 - 123 .

ZHANG X C , DEMIRIS Y . Visible and infrared image fusion using deep learning [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 8 ): 10535 - 10554 .

WANG R C , ZHOU Z F , LI S H , et al . Advances and challenges in infrared-visible image fusion: A comprehensive review of techniques and applications [J ] . Artificial Intelligence Review , 2026 , 59 ( 1 ): 18 .

LECUN Y , BOTTOU L , BENGIO Y , et al . Gradient-based learning applied to document recognition [J ] . Proceedings of the IEEE , 1998 , 86 ( 11 ): 2278 - 2324 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [EB/OL ] . ( 2023-08-02 )[ 2025-10-10 ] . https://arxiv.org/abs/1706.03762 https://arxiv.org/abs/1706.03762 .

KHAN W Z , AHMED E , HAKAK S , et al . Edge computing: A survey [J ] . Future Generation Computer Systems , 2019 , 97 : 219 - 235 .

HINTON G . Distilling the knowledge in a neural network [EB/OL ] . ( 2015-03-09 )[ 2025-10-10 ] . https://arxiv.org/abs/1503.02531 https://arxiv.org/abs/1503.02531 .

GOU J P , YU B S , MAYBANK S J , et al . Knowledge distillation: A survey [J ] . International Journal of Computer Vision , 2021 , 129 ( 6 ): 1789 - 1819 .

HSU C C , NI C C , LEE C M , et al . CSAKD: Knowledge distillation with cross self-attention for hyperspectral and multispectral image fusion [EB/OL ] . ( 2024-06-28 )[ 2025-10-10 ] . https://arXiv.org/abs/2406.19666 https://arXiv.org/abs/2406.19666 .

YUE C K , ZHANG Y , YAN J H , et al . Diffusion mechanism and knowledge distillation object detection in multimodal remote sensing imagery [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2025 , 63 : 4408314 .

XIAO W X , ZHANG Y F , WANG H B , et al . Heterogeneous knowledge distillation for simultaneous infrared-visible image fusion and super-resolution [J ] . IEEE Transactions on Instrumentation and Measurement , 2022 , 71 : 5004015 .

YANG C J , LUO X Q , ZHANG Z C , et al . KDFuse: A high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation [J ] . Information Fusion , 2025 , 118 : 102944 .

LI S T , KANG X D , HU J W . Image fusion with guided filtering [J ] . IEEE Transactions on Image Processing , 2013 , 22 ( 7 ): 2864 - 2875 .

BURT P J , KOLCZYNSKI R J . Enhanced image capture through fusion [C ] // 1993 (4th) International Conference on Computer Vision . Piscataway : IEEE , 2002 : 173 - 182 .

KUMAR S S , MUTTAN S . PCA-based image fusion [J ] . Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XII , 2006 , 6233 : 62331T .

LI H , WU X J . DenseFuse: A fusion approach to infrared and visible images [J ] . IEEE Transactions on Image Processing , 2019 , 28 ( 5 ): 2614 - 2623 .

ZHANG Y , LIU Y , SUN P , et al . IFCNN: A general image fusion framework based on convolutional neural network [J ] . Information Fusion , 2020 , 54 : 99 - 118 .

PRABHAKAR K R , SAI SRIKAR V , BABU R V . DeepFuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs [C ] // 2017 IEEE International Conference on Computer Vision . Piscataway : IEEE , 2017 : 4724 - 4732 .

DOSOVITSKIY A , BEYER L , KOLESNIKOV A , et al . An image is worth 16 x 16 words: Transformers for image recognition at scale[EB/OL ] . ( 2021-06-03 )[ 2025-10-10 ] . https://arXiv.org/abs/2010.11929 https://arXiv.org/abs/2010.11929 .

MA J Y , TANG L F , FAN F , et al . SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer [J ] . IEEE/CAA Journal of Automatica Sinica , 2022 , 9 ( 7 ): 1200 - 1217 .

LI J F , SONG H , LIU L , et al . MixFuse: An iterative mix-attention transformer for multi-modal image fusion [J ] . Expert Systems with Applications , 2025 , 261 : 125427 .

LI J F , LIU L , SONG H , et al . DCTNet: A heterogeneous dual-branch multi-cascade network for infrared and visible image fusion [J ] . IEEE Transactions on Instrumentation and Measurement , 2023 , 72 : 5030914 .

XU J , HE X . DAF-net: A dual-branch feature decomposition fusion network with domain adaptive for infrared and visible image fusion [EB/OL ] . ( 2024-09-18 )[ 2025-10-10 ] . https://arXiv.org/abs/2409.11642 https://arXiv.org/abs/2409.11642 .

ZHAO Z X , BAI H W , ZHANG J S , et al . CDDFuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 5906 - 5916 .

ZHOU H L , SONG L C , CHEN J J , et al . Rethinking soft labels for knowledge distillation: A bias-variance tradeoff perspective [EB/OL ] . ( 2021-02-01 )[ 2025-10-10 ] . https://arXiv.org/abs/2102.00650 https://arXiv.org/abs/2102.00650 .

HEO B , KIM J , YUN S , et al . A comprehensive overhaul of feature distillation [C ] // 2019 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2020 : 1921 - 1930 .

PARK W , KIM D , LU Y , et al . Relational knowledge distillation [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 3962 - 3971 .

WANG T , YUAN L , ZHANG X P , et al . Distilling object detectors with fine-grained feature imitation [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 4928 - 4937 .

WANG J B , CHEN Y M , ZHENG Z H , et al . CrossKD: Cross-head knowledge distillation for object detection [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2024 : 16520 - 16530 .

LIU K , ZHANG Y Y , ZHANG J Y , et al . DFD: distilling the feature disparity differently for detectors [C ] // Proceedings of the 41st International Conference on Machine Learning . New York : ACM , 2024 : 32421 - 32430 .

NI Z L , YANG F K , WEN S Z , et al . Dual relation knowledge distillation for object detection [EB/OL ] . ( 2023-06-01 )[ 2025-10-10 ] . https://arxiv.org/abs/2302.05637 https://arxiv.org/abs/2302.05637 .

YANG C G , ZHOU H L , AN Z L , et al . Cross-image relational knowledge distillation for semantic segmentation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 12309 - 12318 .

MI J , WANG L F , LIU Y , et al . KDE-GAN: A multimodal medical image-fusion model based on knowledge distillation and explainable AI modules [J ] . Computers in Biology and Medicine , 2022 , 151 : 106273 .

DENG Y L , XU T Y , CHENG C Y , et al . MMDRFuse: Distilled mini-model with dynamic refresh for multi-modality image fusion [C ] // Proceedings of the 32nd ACM International Conference on Multimedia . New York : ACM , 2024 : 7326 - 7335 .

TUNG F , MORI G . Similarity-preserving knowledge distillation [C ] // 2019 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2019 : 1365 - 1374 .

ZAMIR S W , ARORA A , KHAN S , et al . Restormer: Efficient transformer for high-resolution image restoration [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 5718 - 5729 .

CHOLLET F . Xception: Deep learning with depthwise separable convolutions [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2017 : 1800 - 1807 .

HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 7132 - 7141 .

TANG L F , YUAN J T , ZHANG H , et al . PIAFusion: A progressive infrared and visible image fusion network based on illumination aware [J ] . Information Fusion , 2022 , 83 : 79 - 92 .

TOET A . The TNO multiband image data collection [J ] . Data in Brief , 2017 , 15 : 249 - 251 .

XU H , MA J Y , LE Z L , et al . FusionDN: A unified densely connected network for image fusion [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 7 ): 12484 - 12491 .

LIU J Y , FAN X , HUANG Z B , et al . Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 5792 - 5801 .

ZHENG N S , ZHOU M , HUANG J , et al . Probing synergistic high-order interaction in infrared and visible image fusion [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 26374 - 26385 .

ZHAO Z X , BAI H W , ZHANG J S , et al . Equivariant multi-modality image fusion [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 25912 - 25921 .

KINGMA D P , BA J . Adam: A method for stochastic optimization [EB/OL ] . ( 2017-01-30 )[ 2025-10-10 ] . https://arxiv.org/abs/1412.6980 https://arxiv.org/abs/1412.6980 .

SMITH L N , TOPIN N . Super-convergence: Very fast training of neural networks using large learning rates [C ] // Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications . SPIE , 2019 : 2520589 .

TANG W , HE F Z , LIU Y , et al . DATFuse: Infrared and visible image fusion via dual attention transformer [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2023 , 33 ( 7 ): 3159 - 3172 .

ZHENG N S , ZHOU M , HUANG J , et al . Frequency Integration and Spatial Compensation Network for infrared and visible image fusion [J ] . Information Fusion , 2024 , 109 : 102359 .

XIAO G B , TANG Z M , GUO H L , et al . FAFusion: Learning for infrared and visible image fusion via frequency awareness [J ] . IEEE Transactions on Instrumentation and Measurement , 2024 , 73 : 5015011 .

TANG L F , YUAN J T , MA J Y . Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network [J ] . Information Fusion , 2022 , 82 : 28 - 42 .

HU K , ZHANG Q L , YUAN M X , et al . SFDFusion: An efficient spatial-frequency domain fusion network for infrared and visible image fusion [EB/OL ] . ( 2024-10-30 )[ 2025-10-10 ] . https://arxiv.org/abs/2410.22837 https://arxiv.org/abs/2410.22837 .

LIU J Y , WU G Y , LIU Z , et al . Infrared and visible image fusion: From data compatibility to task adaption [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2025 , 47 ( 4 ): 2349 - 2369 .

HUYNH-THU Q , GHANBARI M . Scope of validity of PSNR in image/video quality assessment [J ] . Electronics Letters , 2008 , 44 ( 13 ): 800 - 801 .

WANG Z , BOVIK A C , SHEIKH H R , et al . Image quality assessment: From error visibility to structural similarity [J ] . IEEE Transactions on Image Processing , 2004 , 13 ( 4 ): 600 - 612 .

LI H , WU X J , KITTLER J . RFN-Nest: An end-to-end residual fusion network for infrared and visible images [J ] . Information Fusion , 2021 , 73 : 72 - 86 .

JOCHER G , CHAURASIA A , STOKEN A , et al . Ultralytics/yolov5:v7.0 - yolov5 sota realtime instance segmentation [EB/OL ] . ( 2022-11-22 )[ 2025-10-10 ] . https://github.com/ultralytics/yolov5/discussions/10258 https://github.com/ultralytics/yolov5/discussions/10258 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于分层提议与解耦监督的目标检测增量学习

基于稀疏平滑自蒸馏的差分隐私深度学习方法

基于因果提示蒸馏的开放世界目标检测

基于低秩自适应的伸缩感知蒸馏方法