Textual Semantic Guidance for Infrared and Visible Image Fusion

ZHU Mingrui; CHEN Xiru; WEI Xin; WANG Nannan; GAO Xinbo

doi:10.12263/DZXB.20250906

您当前的位置：

首页 >

文章列表页 >

Textual Semantic Guidance for Infrared and Visible Image Fusion

PAPERS | 更新时间：2026-06-04

- Textual Semantic Guidance for Infrared and Visible Image Fusion
- ACTA ELECTRONICA SINICA Vol. 54, Issue 1, Pages: 86-101(2026)
- 作者机构：
  
  西安电子科技大学空天地一体化综合业务网全国重点实验室，陕西西安 710071
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62576261;U22A2096)
- DOI：10.12263/DZXB.20250906
  CLC： TP391.41;
- Received：21 December 2025，
  
  Accepted：06 January 2026，
  
  Published：25 January 2026
- 稿件说明：
移动端阅览
朱明瑞, 陈希茹, 卫鑫, 等. 基于文本语义引导的红外与可见光图像融合方法[J]. 电子学报, 2026, 54(01): 86-101.

ZHU Mingrui, CHEN Xiru, WEI Xin, et al. Textual Semantic Guidance for Infrared and Visible Image Fusion[J]. Acta Electronica Sinica, 2026, 54(01): 86-101.
朱明瑞, 陈希茹, 卫鑫, 等. 基于文本语义引导的红外与可见光图像融合方法[J]. 电子学报, 2026, 54(01): 86-101. DOI：10.12263/DZXB.20250906

ZHU Mingrui, CHEN Xiru, WEI Xin, et al. Textual Semantic Guidance for Infrared and Visible Image Fusion[J]. Acta Electronica Sinica, 2026, 54(01): 86-101. DOI：10.12263/DZXB.20250906

摘要

红外与可见光图像融合（Infrared and Visible image Fusion，IVF）旨在结合两种图像模态中的互补信息，将红外图像中的显著目标与可见光图像的丰富纹理细节进行有效整合，从而生成在信息量与视觉质量方面均优于单一模态的融合图像。现有研究证实，基于深度学习的融合方法已在提升融合图像质量方面取得了显著进展，但这类方法大多仍局限于低层视觉特征层面的建模，对于高层语义信息与视觉特征之间的深层语义关联挖掘仍不充分。近年来，随着大规模视觉-语言模型（Vision-Language Models，VLMs）的快速发展，文本引导的图像融合方法因其灵活性与多样性而展现出巨大潜力。然而，文本语义信息的有效整合与利用仍有待深入研究。针对上述问题，本文提出了一种用于红外与可见光图像融合的文本语义引导方法（Textual Semantic Guidance，TeSG），该方法以下游目标检测与语义分割等视觉任务为目标，通过在融合过程中显式引入由VLMs生成的高层语义信息，实现对融合过程的精准调控。TeSG从两个层级引入文本语义信息：一是由VLMs自动生成文本描述，作为全局文本语义级引导，为融合过程提供高层语义约束；二是基于文本描述生成关键目标区域的掩码语义，实现对前背景区域的定位与差异化建模。基于此，本文设计了三个核心模块：语义信息生成（Semantic Information Generator，SIG）模块基于自动生成的文本描述生成掩码语义与文本语义；掩码引导交叉注意力（Mask-Guided Cross-Attention，MGCA）模块在掩码语义的指导下，对红外与可见光图像的视觉特征进行基于注意力的初步融合，实现掩码级别跨模态特征的交互；文本驱动注意力融合（Text-Driven Attentional Fusion，TDAF）模块通过文本引导注意力和门控机制实现语义级的融合与动态加权。实验结果表明，所提TeSG方法通过双层语义引导的融合范式，在保持多模态图像纹理和对比度方面均优于现有先进方法（State Of The Art，SOTA），并在下游目标检测与语义分割任务中也取得了更优的性能，相较于当前最优的图像融合方法平均提升了1.4%，验证了其竞争力与有效性。本文方法有效解决了现有图像融合算法文本与视觉特征的深层关联探索不充分的问题，实现了融合质量与下游任务性能的双重提升。

Abstract

Infrared and visible image fusion (IVF) aims to integrate the complementary information contained in both image modalities by effectively combining the salient targets in infrared images with the rich texture details present in visible images. Through this integration

IVF produces more informative and comprehensive fused images that surpass single-modality inputs. Existing research has demonstrated that deep learning-based fusion methods have achieved remarkable progress in improving fused image quality. However

most of these approaches focus mainly on low-level visual features

and the deep semantic associations between high-level semantic information and visual features have not yet been sufficiently explored. In recent years

with the rapid development of large vision-language models (VLMs)

text-guided image fusion methods have exhibited great potential due to their flexibility and versatility. However

the effective integration and utilization of textual semantic information in the image fusion process remain insufficiently studied. To tackle these challenges

this paper proposes a textual semantic guidance method for infrared and visible image fusion

termed textual semantic guidanc (TeSG)

which guides the image synthesis process in a way that is optimized for downstream tasks such as object detection and semantic segmentation. By explicitly introducing high-level semantic information generated by VLMs into the fusion pipeline

TeSG achieves precise regulation of the fusion process and enhances the semantic consistency of the fused results. TeSG introduces textual semantics at two levels: the mask semantic level and the text semantic level. First

automatically generated textual descriptions from VLMs are employed as global text-level semantic guidance

providing high-level semantic constraints for the fusion process. Second

based on these textual descriptions

mask semantics corresponding to key target regions are constructed

enabling accurate localization and differentiated modeling of foreground and background regions. Building on this

three core modules are designed to implement the proposed framework. The semantic information generator (SIG) module generates both mask semantics and text semantics from automatically produced textual descriptions. The mask-guided cross-attention (MGCA) module performs preliminary attention-based fusion of visual features from both infrared and visible images under the guidance of mask semantics

thereby realizing mask-level cross-modal feature interaction. Finally

the text-driven attentional fusion (TDAF) module achieves text-level fusion and dynamic weighting through text-guided attention and a gating mechanism

allowing semantic cues to modulate the contribution of different modalities in an adaptive manner. Experimental results demonstrate that the proposed TeSG method

through its dual-level textual semantic guidance paradigm

performs favorably against existing state of the art (SOTA) methods in preserving multimodal texture information and enhancing contrast in the fused images. In addition

TeSG yields superior performance in downstream tasks such as object detection and semantic segmentation

highlighting its task-oriented fusion capability. Compared with current SOTA image fusion approaches

the proposed TeSG achieves an average improvement of 1.4% on downstream tasks

validating its competitiveness and effectiveness while also exhibiting strong generalization ability across different datasets and scene conditions. The proposed method effectively addresses the insufficient exploration of deep correlations between textual and visual features in existing image fusion algorithms

achieving simultaneous improvements in fusion quality and downstream task performance.

关键词

Keywords

references

Ma J Y , Ma Y , Li C . Infrared and visible image fusion methods and applications: A survey [J ] . Information Fusion , 2019 , 45 : 153 - 178 . DOI: 10.1016/j.inffus.2018.02.004 http://dx.doi.org/10.1016/j.inffus.2018.02.004

Zhang H , Xu H , Tian X , et al . Image fusion meets deep learning: A survey and perspective [J ] . Information Fusion , 2021 , 76 : 323 - 336 . DOI: 10.1016/j.inffus.2021.06.008 http://dx.doi.org/10.1016/j.inffus.2021.06.008

Yang Y , Liu J X , Huang S Y , et al . Infrared and visible image fusion via texture conditional generative adversarial network [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2021 , 31 ( 12 ): 4771 - 4783 . DOI: 10.1109/tcsvt.2021.3054584 http://dx.doi.org/10.1109/tcsvt.2021.3054584

Zhao Y Y , Zheng Q C , Zhu P H , et al . TUFusion: A transformer-based universal fusion algorithm for multimodal images [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2024 , 34 ( 3 ): 1712 - 1725 . DOI: 10.1109/tcsvt.2023.3296745 http://dx.doi.org/10.1109/tcsvt.2023.3296745

Davis J W , Sharma V . Background-subtraction using contour-based fusion of thermal and visible imagery [J ] . Computer Vision and Image Understanding , 2007 , 106 ( 2/3 ): 162 - 182 . DOI: 10.1016/j.cviu.2006.06.010 http://dx.doi.org/10.1016/j.cviu.2006.06.010

Han J G , Pauwels E J , de Zeeuw P . Fast saliency-aware multi-modality image fusion [J ] . Neurocomputing , 2013 , 111 : 70 - 80 . DOI: 10.1016/j.neucom.2012.12.015 http://dx.doi.org/10.1016/j.neucom.2012.12.015

Xu P , Davoine F , Bordes J B , et al . Multimodal information fusion for urban scene understanding [J ] . Machine Vision and Applications , 2016 , 27 ( 3 ): 331 - 349 . DOI: 10.1007/s00138-014-0649-7 http://dx.doi.org/10.1007/s00138-014-0649-7

Li H G , Ding W R , Cao X B , et al . Image registration and fusion of visible and infrared integrated camera for medium-altitude unmanned aerial vehicle remote sensing [J ] . Remote Sensing , 2017 , 9 ( 5 ): 441 . DOI: 10.3390/rs9050441 http://dx.doi.org/10.3390/rs9050441

Li S T , Kang X D , Hu J W . Image fusion with guided filtering [J ] . IEEE Transactions on Image Processing , 2013 , 22 ( 7 ): 2864 - 2875 . DOI: 10.1109/tip.2013.2244222 http://dx.doi.org/10.1109/tip.2013.2244222

Ma J L , Zhou Z Q , Wang B , et al . Infrared and visible image fusion based on visual saliency map and weighted least square optimization [J ] . Infrared Physics & Technology , 2017 , 82 : 8 - 17 . DOI: 10.1016/j.infrared.2017.02.005 http://dx.doi.org/10.1016/j.infrared.2017.02.005

Liu J Y , Fan X , Jiang J , et al . Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2022 , 32 ( 1 ): 105 - 119 . DOI: 10.1109/tcsvt.2021.3056725 http://dx.doi.org/10.1109/tcsvt.2021.3056725

Li H , Liu L , Huang W , et al . An improved fusion algorithm for infrared and visible images based on multi-scale transform [J ] . Infrared Physics & Technology , 2016 , 74 : 28 - 37 . DOI: 10.1016/j.infrared.2015.11.002 http://dx.doi.org/10.1016/j.infrared.2015.11.002

Li G F , Lin Y J , Qu X D . An infrared and visible image fusion method based on multi-scale transformation and norm optimization [J ] . Information Fusion , 2021 , 71 : 109 - 129 . DOI: 10.1016/j.inffus.2021.02.008 http://dx.doi.org/10.1016/j.inffus.2021.02.008

Cvejic N , Bull D , Canagarajah N . Region-based multimodal image fusion using ICA bases [J ] . IEEE Sensors Journal , 2007 , 7 ( 5 ): 743 - 751 . DOI: 10.1109/jsen.2007.894926 http://dx.doi.org/10.1109/jsen.2007.894926

Wang J , Peng J Y , Feng X Y , et al . Fusion method for infrared and visible images by using non-negative sparse representation [J ] . Infrared Physics & Technology , 2014 , 67 : 477 - 489 . DOI: 10.1016/j.infrared.2014.09.019 http://dx.doi.org/10.1016/j.infrared.2014.09.019

Bavirisetti D P , Dhuli R . Two-scale image fusion of visible and infrared images using saliency detection [J ] . Infrared Physics & Technology , 2016 , 76 : 52 - 64 . DOI: 10.1016/j.infrared.2016.01.009 http://dx.doi.org/10.1016/j.infrared.2016.01.009

Liu C H , Qi Y , Ding W R . Infrared and visible image fusion method based on saliency detection in sparse domain [J ] . Infrared Physics & Technology , 2017 , 83 : 94 - 102 . DOI: 10.1016/j.infrared.2017.04.018 http://dx.doi.org/10.1016/j.infrared.2017.04.018

Li H , Wu X J . DenseFuse: A fusion approach to infrared and visible images [J ] . IEEE Transactions on Image Processing , 2019 , 28 ( 5 ): 2614 - 2623 . DOI: 10.1109/tip.2018.2887342 http://dx.doi.org/10.1109/tip.2018.2887342

Zhao Z X , Xu S , Zhang C X , et al . DIDFuse: Deep image decomposition for infrared and visible image fusion [C ] // Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence . International Joint Conferences on Artificial Intelligence Organization , 2020 : 970 - 976 . DOI: 10.24963/ijcai.2020/135 http://dx.doi.org/10.24963/ijcai.2020/135

Zhang H , Ma J Y . SDNet: A versatile squeeze-and-decomposition network for real-time image fusion [J ] . International Journal of Computer Vision , 2021 , 129 ( 10 ): 2761 - 2785 . DOI: 10.1007/s11263-021-01501-8 http://dx.doi.org/10.1007/s11263-021-01501-8

Liu Y , Chen X , Cheng J , et al . Infrared and visible image fusion with convolutional neural networks [J ] . International Journal of Wavelets, Multiresolution and Information Processing , 2018 , 16 ( 3 ): 1850018 . DOI: 10.1142/s0219691318500182 http://dx.doi.org/10.1142/s0219691318500182

Xu H , Ma J Y , Jiang J J , et al . U2Fusion: A unified unsupervised image fusion network [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 1 ): 502 - 518 . DOI: 10.1109/tpami.2020.3012548 http://dx.doi.org/10.1109/tpami.2020.3012548

Xu H , Ma J Y , Yuan J T , et al . RFNet: Unsupervised network for mutually reinforcing multi-modal image registration and fusion [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 19647 - 19656 . DOI: 10.1109/cvpr52688.2022.01906 http://dx.doi.org/10.1109/cvpr52688.2022.01906

Ma J Y , Yu W , Liang P W , et al . FusionGAN: A generative adversarial network for infrared and visible image fusion [J ] . Information Fusion , 2019 , 48 : 11 - 26 . DOI: 10.1016/j.inffus.2018.09.004 http://dx.doi.org/10.1016/j.inffus.2018.09.004

Ma J Y , Xu H , Jiang J J , et al . DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion [J ] . IEEE Transactions on Image Processing , 2020 , 29 : 4980 - 4995 . DOI: 10.1109/tip.2020.2977573 http://dx.doi.org/10.1109/tip.2020.2977573

Ma J Y , Liang P W , Yu W , et al . Infrared and visible image fusion via detail preserving adversarial learning [J ] . Information Fusion , 2020 , 54 : 85 - 98 . DOI: 10.1016/j.inffus.2019.07.005 http://dx.doi.org/10.1016/j.inffus.2019.07.005

Ma J Y , Zhang H , Shao Z F , et al . GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion [J ] . IEEE Transactions on Instrumentation and Measurement , 2021 , 70 : 5005014 . DOI: 10.1109/tim.2020.3038013 http://dx.doi.org/10.1109/tim.2020.3038013

Liu J Y , Fan X , Huang Z B , et al . Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 5792 - 5801 . DOI: 10.1109/cvpr52688.2022.00571 http://dx.doi.org/10.1109/cvpr52688.2022.00571

Ma J Y , Tang L F , Fan F , et al . SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer [J ] . IEEE/CAA Journal of Automatica Sinica , 2022 , 9 ( 7 ): 1200 - 1217 . DOI: 10.1109/jas.2022.105686 http://dx.doi.org/10.1109/jas.2022.105686

Wang Z S , Chen Y L , Shao W Y , et al . SwinFuse: A residual swin transformer fusion network for infrared and visible images [J ] . IEEE Transactions on Instrumentation and Measurement , 2022 , 71 : 5016412 . DOI: 10.1109/tim.2022.3191664 http://dx.doi.org/10.1109/tim.2022.3191664

Yi X P , Xu H , Zhang H , et al . Text-IF: Leveraging semantic text guidance for degradation-aware and interactive image fusion [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 27016 - 27025 . DOI: 10.1109/cvpr52733.2024.02552 http://dx.doi.org/10.1109/cvpr52733.2024.02552

Zhao Z X , Deng L L , Bai H W , et al . Image fusion via vision-language model [C ] // Proceedings of the 41st International Conference on Machine Learning . New York : ACM , 2024 : 60749 - 60765 .

Wang H , Zhang H , Yi X P , et al . TeRF: Text-driven and region-aware flexible visible and infrared image fusion [C ] // Proceedings of the 32nd ACM International Conference on Multimedia . New York : ACM , 2024 : 935 - 944 . DOI: 10.1145/3664647.3680971 http://dx.doi.org/10.1145/3664647.3680971

Cheng C Y , Xu T Y , Wu X J , et al . TextFusion: Unveiling the power of textual semantics for controllable image fusion [J ] . Information Fusion , 2025 , 117 : 102790 . DOI: 10.1016/j.inffus.2024.102790 http://dx.doi.org/10.1016/j.inffus.2024.102790

Wang Z Y , Zhao L B , Zhang J Z , et al . Multi-text guidance is important: Multi-modality image fusion via large generative vision-language model [J ] . International Journal of Computer Vision , 2025 , 133 ( 7 ): 4646 - 4668 . DOI: 10.1007/s11263-025-02409-3 http://dx.doi.org/10.1007/s11263-025-02409-3

Li J , Li D , Xiong C , et al . Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation [C ] // International Conference on Machine Learning . PMLR , 2022 : 12888 - 12900 .

Bhatnagar G , Jonathan Wu Q M , Liu Z . Directive contrast based multimodal medical image fusion in NSCT domain [J ] . IEEE Transactions on Multimedia , 2013 , 15 ( 5 ): 1014 - 1024 . DOI: 10.1109/tmm.2013.2244870 http://dx.doi.org/10.1109/tmm.2013.2244870

Mitianoudis N , Stathaki T . Pixel-based and region-based image fusion schemes using ICA bases [J ] . Information Fusion , 2007 , 8 ( 2 ): 131 - 142 . DOI: 10.1016/j.inffus.2005.09.001 http://dx.doi.org/10.1016/j.inffus.2005.09.001

Cui G M , Feng H J , Xu Z H , et al . Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition [J ] . Optics Communications , 2015 , 341 : 199 - 209 . DOI: 10.1016/j.optcom.2014.12.032 http://dx.doi.org/10.1016/j.optcom.2014.12.032

Zhang B H , Lu X Q , Pei H Q , et al . A fusion algorithm for infrared and visible images based on saliency analysis and non-subsampled Shearlet transform [J ] . Infrared Physics & Technology , 2015 , 73 : 286 - 297 . DOI: 10.1016/j.infrared.2015.10.004 http://dx.doi.org/10.1016/j.infrared.2015.10.004

Kong W W , Lei Y , Zhao H X . Adaptive fusion method of visible light and infrared images based on non-subsampled shearlet transform and fast non-negative matrix factorization [J ] . Infrared Physics & Technology , 2014 , 67 : 161 - 172 . DOI: 10.1016/j.infrared.2014.07.019 http://dx.doi.org/10.1016/j.infrared.2014.07.019

方帅 , 万旗 , 曹洋 . 基于跨尺度相似先验的遥感图像时空融合算法 [J ] . 电子学报 , 2024 , 52 ( 6 ): 2037 - 2052 .

Fang Shuai , Wan Qi , Cao Yang . A spatiotemporal fusion algorithm of remote sensing images based on cross-scale similarity prior [J ] . Acta Electronica Sinica , 2024 , 52 ( 6 ): 2037 - 2052 . (in Chinese)

Liu Y , Liu S P , Wang Z F . A general framework for image fusion based on multi-scale transform and sparse representation [J ] . Information Fusion , 2015 , 24 : 147 - 164 . DOI: 10.1016/j.inffus.2014.09.004 http://dx.doi.org/10.1016/j.inffus.2014.09.004

Li H , Wu X J , Kittler J . RFN-Nest: An end-to-end residual fusion network for infrared and visible images [J ] . Information Fusion , 2021 , 73 : 72 - 86 . DOI: 10.1016/j.inffus.2021.02.023 http://dx.doi.org/10.1016/j.inffus.2021.02.023

Goodfellow I J , Pouget-Abadie J , Mirza M , et al . Generative adversarial nets [C ] // Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 . New York : ACM , 2014 : 2672 - 2680 .

Mirza M , Osindero S . Conditional generative adversarial nets [PP/OL ] . V1. arXiv ( 2014-11-06 )[ 2025-12-21 ] . https://doi.org/10.48550/arXiv.1411.1784 https://doi.org/10.48550/arXiv.1411.1784 .

Zhao Z X , Bai H W , Zhu Y Z , et al . DDFM: Denoising diffusion model for multi-modality image fusion [C ] // 2023 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2023 : 8048 - 8059 . DOI: 10.1109/iccv51070.2023.00742 http://dx.doi.org/10.1109/iccv51070.2023.00742

Yue J , Fang L Y , Xia S B , et al . Dif-fusion: Toward high color fidelity in infrared and visible image fusion with diffusion models [J ] . IEEE Transactions on Image Processing , 2023 , 32 : 5705 - 5720 . DOI: 10.1109/tip.2023.3322046 http://dx.doi.org/10.1109/tip.2023.3322046

Ramesh A , Pavlov M , Goh G , et al . Zero-shot text-to-image generation [C ] // International Conference on Machine Learning . PMLR , 2021 : 8821 - 8831 .

Kim G , Kwon T , Ye J C . DiffusionCLIP: Text-guided diffusion models for robust image manipulation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 2416 - 2425 . DOI: 10.1109/cvpr52688.2022.00246 http://dx.doi.org/10.1109/cvpr52688.2022.00246

Ruiz N , Li Y Z , Jampani V , et al . DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 22500 - 22510 . DOI: 10.1109/cvpr52729.2023.02155 http://dx.doi.org/10.1109/cvpr52729.2023.02155

Lin Y Z , Chen Y W , Tsai Y H , et al . Text-driven image editing via learnable regions [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 7059 - 7068 . DOI: 10.1109/cvpr52733.2024.00674 http://dx.doi.org/10.1109/cvpr52733.2024.00674

Kawar B , Zada S , Lang O , et al . Imagic: Text-based real image editing with diffusion models [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 6007 - 6017 . DOI: 10.1109/cvpr52729.2023.00582 http://dx.doi.org/10.1109/cvpr52729.2023.00582

Tumanyan N , Geyer M , Bagon S , et al . Plug-and-play diffusion features for text-driven image-to-image translation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 1921 - 1930 . DOI: 10.1109/cvpr52729.2023.00191 http://dx.doi.org/10.1109/cvpr52729.2023.00191

Qi T H , Fang S C , Wu Y Z , et al . DEADiff: An efficient stylization diffusion model with disentangled representations [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 8693 - 8702 . DOI: 10.1109/cvpr52733.2024.00830 http://dx.doi.org/10.1109/cvpr52733.2024.00830

Nichol A Q , Dhariwal P , Ramesh A , et al . GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models [C ] // International Conference on Machine Learning . PMLR , 2022 : 16784 - 16804 .

Yang Z Y , Zhang Y F , Li H F , et al . Instruction-driven fusion of Infrared-visible images: Tailoring for diverse downstream tasks [J ] . Information Fusion , 2025 , 121 : 103148 . DOI: 10.1016/j.inffus.2025.103148 http://dx.doi.org/10.1016/j.inffus.2025.103148

Tang L F , Yuan J T , Ma J Y . Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network [J ] . Information Fusion , 2022 , 82 : 28 - 42 . DOI: 10.1016/j.inffus.2021.12.004 http://dx.doi.org/10.1016/j.inffus.2021.12.004

Liu J Y , Liu Z , Wu G Y , et al . Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation [C ] // 2023 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2023 : 8081 - 8090 . DOI: 10.1109/iccv51070.2023.00745 http://dx.doi.org/10.1109/iccv51070.2023.00745

Rombach R , Blattmann A , Lorenz D , et al . High-resolution image synthesis with latent diffusion models [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 10674 - 10685 . DOI: 10.1109/cvpr52688.2022.01042 http://dx.doi.org/10.1109/cvpr52688.2022.01042

Wang Z , Bovik A C , Sheikh H R , et al . Image quality assessment: From error visibility to structural similarity [J ] . IEEE Transactions on Image Processing , 2004 , 13 ( 4 ): 600 - 612 . DOI: 10.1109/tip.2003.819861 http://dx.doi.org/10.1109/tip.2003.819861

Tang L F , Yuan J T , Zhang H , et al . PIAFusion: A progressive infrared and visible image fusion network based on illumination aware [J ] . Information Fusion , 2022 , 83 : 79 - 92 . DOI: 10.1016/j.inffus.2022.03.007 http://dx.doi.org/10.1016/j.inffus.2022.03.007

Jia X Y , Zhu C , Li M Z , et al . LLVIP: A visible-infrared paired dataset for low-light vision [C ] // 2021 IEEE/CVF International Conference on Computer Vision Workshops . Piscataway : IEEE , 2021 : 3489 - 3497 . DOI: 10.1109/iccvw54120.2021.00389 http://dx.doi.org/10.1109/iccvw54120.2021.00389

Huang Z B , Liu J Y , Fan X , et al . ReCoNet: Recurrent correction network for fast and efficient multi-modality image fusion [M ] // Computer Vision - ECCV 2022 . Cham : Springer International Publishing , 2022 : 539 - 555 . DOI: 10.1007/978-3-031-19797-0_31 http://dx.doi.org/10.1007/978-3-031-19797-0_31

Li H , Xu T Y , Wu X J , et al . LRRNet: A novel representation learning guided fusion network for infrared and visible images [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 9 ): 11040 - 11052 . DOI: 10.1109/tpami.2023.3268209 http://dx.doi.org/10.1109/tpami.2023.3268209

Zhao W D , Xie S G , Zhao F , et al . MetaFusion: Infrared and visible image fusion via meta-feature embedding from object detection [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 13955 - 13965 . DOI: 10.1109/cvpr52729.2023.01341 http://dx.doi.org/10.1109/cvpr52729.2023.01341

Zhao Z X , Bai H W , Zhang J S , et al . CDDFuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 5906 - 5916 . DOI: 10.1109/cvpr52729.2023.00572 http://dx.doi.org/10.1109/cvpr52729.2023.00572

Deng Y L , Xu T Y , Cheng C Y , et al . MMDRFuse: Distilled mini-model with dynamic refresh for multi-modality image fusion [C ] // Proceedings of the 32nd ACM International Conference on Multimedia . New York : ACM , 2024 : 7326 - 7335 . DOI: 10.1145/3664647.3681085 http://dx.doi.org/10.1145/3664647.3681085

Li H F , Yang Z Y , Zhang Y F , et al . MulFS-CAP: Multimodal fusion-supervised cross-modality alignment perception for unregistered infrared-visible image fusion [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2025 , 47 ( 5 ): 3673 - 3690 . DOI: 10.1109/tpami.2025.3535617 http://dx.doi.org/10.1109/tpami.2025.3535617

Aslantas V , Bendes E . A new image quality metric for image fusion: The sum of the correlations of differences [J ] . AEU - International Journal of Electronics and Communications , 2015 , 69 ( 12 ): 1890 - 1896 . DOI: 10.1016/j.aeue.2015.09.004 http://dx.doi.org/10.1016/j.aeue.2015.09.004

Han Y , Cai Y Z , Cao Y , et al . A new image fusion performance metric based on visual information fidelity [J ] . Information Fusion , 2013 , 14 ( 2 ): 127 - 135 . DOI: 10.1016/j.inffus.2011.08.002 http://dx.doi.org/10.1016/j.inffus.2011.08.002

Chen L C , Zhu Y K , Papandreou G , et al . Encoder-decoder with atrous separable convolution for semantic image segmentation [M ] // Computer Vision - ECCV 2018 . Cham : Springer International Publishing , 2018 : 833 - 851 . DOI: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

MW-DAN: Multilevel Wavelet-Deep Aggregation Network for Hyperspectral Image Super-Resolution

Research on the Fusion Algorithm of Infrared and Visible Images Based on Non-subsampled Shearlet Transform

A Low-Light Image Enhancement Algorithm Using the Hybrid Strategy of Deep Learning and Image Fusion

A TCAD-DNN-Based Total-Ionizing-Dose Effect Model for FinFET Devices

Scene Graph Generation of Livestreaming Video via VLM Convex Optimization

Related Author

ZHU Mingrui

CHEN Xiru

WEI Xin

WANG Nannan

GAO Xinbo

FANG Jian

YANG Jing-xiang

XIAO Liang

Related Institution

State Key Laboratory of Integrated Services Networks, Xidian University

School of Computer Science and Engineering， Nanjing University of Science and Technology

Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense

Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense， Nanjing University of Science and Technology

Computer and Information Sciences Division, Boda College of Jilin Normal University

⁰