

浏览全部资源
扫码关注微信
西安电子科技大学空天地一体化综合业务网全国重点实验室,陕西西安 710071
Received:21 December 2025,
Accepted:06 January 2026,
Published:25 January 2026
移动端阅览
朱明瑞, 陈希茹, 卫鑫, 等. 基于文本语义引导的红外与可见光图像融合方法[J]. 电子学报, 2026, 54(01): 86-101.
ZHU Mingrui, CHEN Xiru, WEI Xin, et al. Textual Semantic Guidance for Infrared and Visible Image Fusion[J]. Acta Electronica Sinica, 2026, 54(01): 86-101.
朱明瑞, 陈希茹, 卫鑫, 等. 基于文本语义引导的红外与可见光图像融合方法[J]. 电子学报, 2026, 54(01): 86-101. DOI:10.12263/DZXB.20250906
ZHU Mingrui, CHEN Xiru, WEI Xin, et al. Textual Semantic Guidance for Infrared and Visible Image Fusion[J]. Acta Electronica Sinica, 2026, 54(01): 86-101. DOI:10.12263/DZXB.20250906
红外与可见光图像融合(Infrared and Visible image Fusion,IVF)旨在结合两种图像模态中的互补信息,将红外图像中的显著目标与可见光图像的丰富纹理细节进行有效整合,从而生成在信息量与视觉质量方面均优于单一模态的融合图像。现有研究证实,基于深度学习的融合方法已在提升融合图像质量方面取得了显著进展,但这类方法大多仍局限于低层视觉特征层面的建模,对于高层语义信息与视觉特征之间的深层语义关联挖掘仍不充分。近年来,随着大规模视觉-语言模型(Vision-Language Models,VLMs)的快速发展,文本引导的图像融合方法因其灵活性与多样性而展现出巨大潜力。然而,文本语义信息的有效整合与利用仍有待深入研究。针对上述问题,本文提出了一种用于红外与可见光图像融合的文本语义引导方法(Textual Semantic Guidance,TeSG),该方法以下游目标检测与语义分割等视觉任务为目标,通过在融合过程中显式引入由VLMs生成的高层语义信息,实现对融合过程的精准调控。TeSG从两个层级引入文本语义信息:一是由VLMs自动生成文本描述,作为全局文本语义级引导,为融合过程提供高层语义约束;二是基于文本描述生成关键目标区域的掩码语义,实现对前背景区域的定位与差异化建模。基于此,本文设计了三个核心模块:语义信息生成(Semantic Information Generator,SIG)模块基于自动生成的文本描述生成掩码语义与文本语义;掩码引导交叉注意力(Mask-Guided Cross-Attention,MGCA)模块在掩码语义的指导下,对红外与可见光图像的视觉特征进行基于注意力的初步融合,实现掩码级别跨模态特征的交互;文本驱动注意力融合(Text-Driven Attentional Fusion,TDAF)模块通过文本引导注意力和门控机制实现语义级的融合与动态加权。实验结果表明,所提TeSG方法通过双层语义引导的融合范式,在保持多模态图像纹理和对比度方面均优于现有先进方法(State Of The Art,SOTA),并在下游目标检测与语义分割任务中也取得了更优的性能,相较于当前最优的图像融合方法平均提升了1.4%,验证了其竞争力与有效性。本文方法有效解决了现有图像融合算法文本与视觉特征的深层关联探索不充分的问题,实现了融合质量与下游任务性能的双重提升。
Infrared and visible image fusion (IVF) aims to integrate the complementary information contained in both image modalities by effectively combining the salient targets in infrared images with the rich texture details present in visible images. Through this integration
IVF produces more informative and comprehensive fused images that surpass single-modality inputs. Existing research has demonstrated that deep learning-based fusion methods have achieved remarkable progress in improving fused image quality. However
most of these approaches focus mainly on low-level visual features
and the deep semantic associations between high-level semantic information and visual features have not yet been sufficiently explored. In recent years
with the rapid development of large vision-language models (VLMs)
text-guided image fusion methods have exhibited great potential due to their flexibility and versatility. However
the effective integration and utilization of textual semantic information in the image fusion process remain insufficiently studied. To tackle these challenges
this paper proposes a textual semantic guidance method for infrared and visible image fusion
termed textual semantic guidanc (TeSG)
which guides the image synthesis process in a way that is optimized for downstream tasks such as object detection and semantic segmentation. By explicitly introducing high-level semantic information generated by VLMs into the fusion pipeline
TeSG achieves precise regulation of the fusion process and enhances the semantic consistency of the fused results. TeSG introduces textual semantics at two levels: the mask semantic level and the text semantic level. First
automatically generated textual descriptions from VLMs are employed as global text-level semantic guidance
providing high-level semantic constraints for the fusion process. Second
based on these textual descriptions
mask semantics corresponding to key target regions are constructed
enabling accurate localization and differentiated modeling of foreground and background regions. Building on this
three core modules are designed to implement the proposed framework. The semantic information generator (SIG) module generates both mask semantics and text semantics from automatically produced textual descriptions. The mask-guided cross-attention (MGCA) module performs preliminary attention-based fusion of visual features from both infrared and visible images under the guidance of mask semantics
thereby realizing mask-level cross-modal feature interaction. Finally
the text-driven attentional fusion (TDAF) module achieves text-level fusion and dynamic weighting through text-guided attention and a gating mechanism
allowing semantic cues to modulate the contribution of different modalities in an adaptive manner. Experimental results demonstrate that the proposed TeSG method
through its dual-level textual semantic guidance paradigm
performs favorably against existing state of the art (SOTA) methods in preserving multimodal texture information and enhancing contrast in the fused images. In addition
TeSG yields superior performance in downstream tasks such as object detection and semantic segmentation
highlighting its task-oriented fusion capability. Compared with current SOTA image fusion approaches
the proposed TeSG achieves an average improvement of 1.4% on downstream tasks
validating its competitiveness and effectiveness while also exhibiting strong generalization ability across different datasets and scene conditions. The proposed method effectively addresses the insufficient exploration of deep correlations between textual and visual features in existing image fusion algorithms
achieving simultaneous improvements in fusion quality and downstream task performance.
Ma J Y , Ma Y , Li C . Infrared and visible image fusion methods and applications: A survey [J ] . Information Fusion , 2019 , 45 : 153 - 178 . DOI: 10.1016/j.inffus.2018.02.004 http://dx.doi.org/10.1016/j.inffus.2018.02.004
Zhang H , Xu H , Tian X , et al . Image fusion meets deep learning: A survey and perspective [J ] . Information Fusion , 2021 , 76 : 323 - 336 . DOI: 10.1016/j.inffus.2021.06.008 http://dx.doi.org/10.1016/j.inffus.2021.06.008
Yang Y , Liu J X , Huang S Y , et al . Infrared and visible image fusion via texture conditional generative adversarial network [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2021 , 31 ( 12 ): 4771 - 4783 . DOI: 10.1109/tcsvt.2021.3054584 http://dx.doi.org/10.1109/tcsvt.2021.3054584
Zhao Y Y , Zheng Q C , Zhu P H , et al . TUFusion: A transformer-based universal fusion algorithm for multimodal images [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2024 , 34 ( 3 ): 1712 - 1725 . DOI: 10.1109/tcsvt.2023.3296745 http://dx.doi.org/10.1109/tcsvt.2023.3296745
Davis J W , Sharma V . Background-subtraction using contour-based fusion of thermal and visible imagery [J ] . Computer Vision and Image Understanding , 2007 , 106 ( 2/3 ): 162 - 182 . DOI: 10.1016/j.cviu.2006.06.010 http://dx.doi.org/10.1016/j.cviu.2006.06.010
Han J G , Pauwels E J , de Zeeuw P . Fast saliency-aware multi-modality image fusion [J ] . Neurocomputing , 2013 , 111 : 70 - 80 . DOI: 10.1016/j.neucom.2012.12.015 http://dx.doi.org/10.1016/j.neucom.2012.12.015
Xu P , Davoine F , Bordes J B , et al . Multimodal information fusion for urban scene understanding [J ] . Machine Vision and Applications , 2016 , 27 ( 3 ): 331 - 349 . DOI: 10.1007/s00138-014-0649-7 http://dx.doi.org/10.1007/s00138-014-0649-7
Li H G , Ding W R , Cao X B , et al . Image registration and fusion of visible and infrared integrated camera for medium-altitude unmanned aerial vehicle remote sensing [J ] . Remote Sensing , 2017 , 9 ( 5 ): 441 . DOI: 10.3390/rs9050441 http://dx.doi.org/10.3390/rs9050441
Li S T , Kang X D , Hu J W . Image fusion with guided filtering [J ] . IEEE Transactions on Image Processing , 2013 , 22 ( 7 ): 2864 - 2875 . DOI: 10.1109/tip.2013.2244222 http://dx.doi.org/10.1109/tip.2013.2244222
Ma J L , Zhou Z Q , Wang B , et al . Infrared and visible image fusion based on visual saliency map and weighted least square optimization [J ] . Infrared Physics & Technology , 2017 , 82 : 8 - 17 . DOI: 10.1016/j.infrared.2017.02.005 http://dx.doi.org/10.1016/j.infrared.2017.02.005
Liu J Y , Fan X , Jiang J , et al . Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2022 , 32 ( 1 ): 105 - 119 . DOI: 10.1109/tcsvt.2021.3056725 http://dx.doi.org/10.1109/tcsvt.2021.3056725
Li H , Liu L , Huang W , et al . An improved fusion algorithm for infrared and visible images based on multi-scale transform [J ] . Infrared Physics & Technology , 2016 , 74 : 28 - 37 . DOI: 10.1016/j.infrared.2015.11.002 http://dx.doi.org/10.1016/j.infrared.2015.11.002
Li G F , Lin Y J , Qu X D . An infrared and visible image fusion method based on multi-scale transformation and norm optimization [J ] . Information Fusion , 2021 , 71 : 109 - 129 . DOI: 10.1016/j.inffus.2021.02.008 http://dx.doi.org/10.1016/j.inffus.2021.02.008
Cvejic N , Bull D , Canagarajah N . Region-based multimodal image fusion using ICA bases [J ] . IEEE Sensors Journal , 2007 , 7 ( 5 ): 743 - 751 . DOI: 10.1109/jsen.2007.894926 http://dx.doi.org/10.1109/jsen.2007.894926
Wang J , Peng J Y , Feng X Y , et al . Fusion method for infrared and visible images by using non-negative sparse representation [J ] . Infrared Physics & Technology , 2014 , 67 : 477 - 489 . DOI: 10.1016/j.infrared.2014.09.019 http://dx.doi.org/10.1016/j.infrared.2014.09.019
Bavirisetti D P , Dhuli R . Two-scale image fusion of visible and infrared images using saliency detection [J ] . Infrared Physics & Technology , 2016 , 76 : 52 - 64 . DOI: 10.1016/j.infrared.2016.01.009 http://dx.doi.org/10.1016/j.infrared.2016.01.009
Liu C H , Qi Y , Ding W R . Infrared and visible image fusion method based on saliency detection in sparse domain [J ] . Infrared Physics & Technology , 2017 , 83 : 94 - 102 . DOI: 10.1016/j.infrared.2017.04.018 http://dx.doi.org/10.1016/j.infrared.2017.04.018
Li H , Wu X J . DenseFuse: A fusion approach to infrared and visible images [J ] . IEEE Transactions on Image Processing , 2019 , 28 ( 5 ): 2614 - 2623 . DOI: 10.1109/tip.2018.2887342 http://dx.doi.org/10.1109/tip.2018.2887342
Zhao Z X , Xu S , Zhang C X , et al . DIDFuse: Deep image decomposition for infrared and visible image fusion [C ] // Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence . International Joint Conferences on Artificial Intelligence Organization , 2020 : 970 - 976 . DOI: 10.24963/ijcai.2020/135 http://dx.doi.org/10.24963/ijcai.2020/135
Zhang H , Ma J Y . SDNet: A versatile squeeze-and-decomposition network for real-time image fusion [J ] . International Journal of Computer Vision , 2021 , 129 ( 10 ): 2761 - 2785 . DOI: 10.1007/s11263-021-01501-8 http://dx.doi.org/10.1007/s11263-021-01501-8
Liu Y , Chen X , Cheng J , et al . Infrared and visible image fusion with convolutional neural networks [J ] . International Journal of Wavelets, Multiresolution and Information Processing , 2018 , 16 ( 3 ): 1850018 . DOI: 10.1142/s0219691318500182 http://dx.doi.org/10.1142/s0219691318500182
Xu H , Ma J Y , Jiang J J , et al . U2Fusion: A unified unsupervised image fusion network [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 1 ): 502 - 518 . DOI: 10.1109/tpami.2020.3012548 http://dx.doi.org/10.1109/tpami.2020.3012548
Xu H , Ma J Y , Yuan J T , et al . RFNet: Unsupervised network for mutually reinforcing multi-modal image registration and fusion [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 19647 - 19656 . DOI: 10.1109/cvpr52688.2022.01906 http://dx.doi.org/10.1109/cvpr52688.2022.01906
Ma J Y , Yu W , Liang P W , et al . FusionGAN: A generative adversarial network for infrared and visible image fusion [J ] . Information Fusion , 2019 , 48 : 11 - 26 . DOI: 10.1016/j.inffus.2018.09.004 http://dx.doi.org/10.1016/j.inffus.2018.09.004
Ma J Y , Xu H , Jiang J J , et al . DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion [J ] . IEEE Transactions on Image Processing , 2020 , 29 : 4980 - 4995 . DOI: 10.1109/tip.2020.2977573 http://dx.doi.org/10.1109/tip.2020.2977573
Ma J Y , Liang P W , Yu W , et al . Infrared and visible image fusion via detail preserving adversarial learning [J ] . Information Fusion , 2020 , 54 : 85 - 98 . DOI: 10.1016/j.inffus.2019.07.005 http://dx.doi.org/10.1016/j.inffus.2019.07.005
Ma J Y , Zhang H , Shao Z F , et al . GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion [J ] . IEEE Transactions on Instrumentation and Measurement , 2021 , 70 : 5005014 . DOI: 10.1109/tim.2020.3038013 http://dx.doi.org/10.1109/tim.2020.3038013
Liu J Y , Fan X , Huang Z B , et al . Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 5792 - 5801 . DOI: 10.1109/cvpr52688.2022.00571 http://dx.doi.org/10.1109/cvpr52688.2022.00571
Ma J Y , Tang L F , Fan F , et al . SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer [J ] . IEEE/CAA Journal of Automatica Sinica , 2022 , 9 ( 7 ): 1200 - 1217 . DOI: 10.1109/jas.2022.105686 http://dx.doi.org/10.1109/jas.2022.105686
Wang Z S , Chen Y L , Shao W Y , et al . SwinFuse: A residual swin transformer fusion network for infrared and visible images [J ] . IEEE Transactions on Instrumentation and Measurement , 2022 , 71 : 5016412 . DOI: 10.1109/tim.2022.3191664 http://dx.doi.org/10.1109/tim.2022.3191664
Yi X P , Xu H , Zhang H , et al . Text-IF: Leveraging semantic text guidance for degradation-aware and interactive image fusion [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 27016 - 27025 . DOI: 10.1109/cvpr52733.2024.02552 http://dx.doi.org/10.1109/cvpr52733.2024.02552
Zhao Z X , Deng L L , Bai H W , et al . Image fusion via vision-language model [C ] // Proceedings of the 41st International Conference on Machine Learning . New York : ACM , 2024 : 60749 - 60765 .
Wang H , Zhang H , Yi X P , et al . TeRF: Text-driven and region-aware flexible visible and infrared image fusion [C ] // Proceedings of the 32nd ACM International Conference on Multimedia . New York : ACM , 2024 : 935 - 944 . DOI: 10.1145/3664647.3680971 http://dx.doi.org/10.1145/3664647.3680971
Cheng C Y , Xu T Y , Wu X J , et al . TextFusion: Unveiling the power of textual semantics for controllable image fusion [J ] . Information Fusion , 2025 , 117 : 102790 . DOI: 10.1016/j.inffus.2024.102790 http://dx.doi.org/10.1016/j.inffus.2024.102790
Wang Z Y , Zhao L B , Zhang J Z , et al . Multi-text guidance is important: Multi-modality image fusion via large generative vision-language model [J ] . International Journal of Computer Vision , 2025 , 133 ( 7 ): 4646 - 4668 . DOI: 10.1007/s11263-025-02409-3 http://dx.doi.org/10.1007/s11263-025-02409-3
Li J , Li D , Xiong C , et al . Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation [C ] // International Conference on Machine Learning . PMLR , 2022 : 12888 - 12900 .
Bhatnagar G , Jonathan Wu Q M , Liu Z . Directive contrast based multimodal medical image fusion in NSCT domain [J ] . IEEE Transactions on Multimedia , 2013 , 15 ( 5 ): 1014 - 1024 . DOI: 10.1109/tmm.2013.2244870 http://dx.doi.org/10.1109/tmm.2013.2244870
Mitianoudis N , Stathaki T . Pixel-based and region-based image fusion schemes using ICA bases [J ] . Information Fusion , 2007 , 8 ( 2 ): 131 - 142 . DOI: 10.1016/j.inffus.2005.09.001 http://dx.doi.org/10.1016/j.inffus.2005.09.001
Cui G M , Feng H J , Xu Z H , et al . Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition [J ] . Optics Communications , 2015 , 341 : 199 - 209 . DOI: 10.1016/j.optcom.2014.12.032 http://dx.doi.org/10.1016/j.optcom.2014.12.032
Zhang B H , Lu X Q , Pei H Q , et al . A fusion algorithm for infrared and visible images based on saliency analysis and non-subsampled Shearlet transform [J ] . Infrared Physics & Technology , 2015 , 73 : 286 - 297 . DOI: 10.1016/j.infrared.2015.10.004 http://dx.doi.org/10.1016/j.infrared.2015.10.004
Kong W W , Lei Y , Zhao H X . Adaptive fusion method of visible light and infrared images based on non-subsampled shearlet transform and fast non-negative matrix factorization [J ] . Infrared Physics & Technology , 2014 , 67 : 161 - 172 . DOI: 10.1016/j.infrared.2014.07.019 http://dx.doi.org/10.1016/j.infrared.2014.07.019
方帅 , 万旗 , 曹洋 . 基于跨尺度相似先验的遥感图像时空融合算法 [J ] . 电子学报 , 2024 , 52 ( 6 ): 2037 - 2052 .
Fang Shuai , Wan Qi , Cao Yang . A spatiotemporal fusion algorithm of remote sensing images based on cross-scale similarity prior [J ] . Acta Electronica Sinica , 2024 , 52 ( 6 ): 2037 - 2052 . (in Chinese)
Liu Y , Liu S P , Wang Z F . A general framework for image fusion based on multi-scale transform and sparse representation [J ] . Information Fusion , 2015 , 24 : 147 - 164 . DOI: 10.1016/j.inffus.2014.09.004 http://dx.doi.org/10.1016/j.inffus.2014.09.004
Li H , Wu X J , Kittler J . RFN-Nest: An end-to-end residual fusion network for infrared and visible images [J ] . Information Fusion , 2021 , 73 : 72 - 86 . DOI: 10.1016/j.inffus.2021.02.023 http://dx.doi.org/10.1016/j.inffus.2021.02.023
Goodfellow I J , Pouget-Abadie J , Mirza M , et al . Generative adversarial nets [C ] // Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 . New York : ACM , 2014 : 2672 - 2680 .
Mirza M , Osindero S . Conditional generative adversarial nets [PP/OL ] . V1. arXiv ( 2014-11-06 )[ 2025-12-21 ] . https://doi.org/10.48550/arXiv.1411.1784 https://doi.org/10.48550/arXiv.1411.1784 .
Zhao Z X , Bai H W , Zhu Y Z , et al . DDFM: Denoising diffusion model for multi-modality image fusion [C ] // 2023 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2023 : 8048 - 8059 . DOI: 10.1109/iccv51070.2023.00742 http://dx.doi.org/10.1109/iccv51070.2023.00742
Yue J , Fang L Y , Xia S B , et al . Dif-fusion: Toward high color fidelity in infrared and visible image fusion with diffusion models [J ] . IEEE Transactions on Image Processing , 2023 , 32 : 5705 - 5720 . DOI: 10.1109/tip.2023.3322046 http://dx.doi.org/10.1109/tip.2023.3322046
Ramesh A , Pavlov M , Goh G , et al . Zero-shot text-to-image generation [C ] // International Conference on Machine Learning . PMLR , 2021 : 8821 - 8831 .
Kim G , Kwon T , Ye J C . DiffusionCLIP: Text-guided diffusion models for robust image manipulation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 2416 - 2425 . DOI: 10.1109/cvpr52688.2022.00246 http://dx.doi.org/10.1109/cvpr52688.2022.00246
Ruiz N , Li Y Z , Jampani V , et al . DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 22500 - 22510 . DOI: 10.1109/cvpr52729.2023.02155 http://dx.doi.org/10.1109/cvpr52729.2023.02155
Lin Y Z , Chen Y W , Tsai Y H , et al . Text-driven image editing via learnable regions [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 7059 - 7068 . DOI: 10.1109/cvpr52733.2024.00674 http://dx.doi.org/10.1109/cvpr52733.2024.00674
Kawar B , Zada S , Lang O , et al . Imagic: Text-based real image editing with diffusion models [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 6007 - 6017 . DOI: 10.1109/cvpr52729.2023.00582 http://dx.doi.org/10.1109/cvpr52729.2023.00582
Tumanyan N , Geyer M , Bagon S , et al . Plug-and-play diffusion features for text-driven image-to-image translation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 1921 - 1930 . DOI: 10.1109/cvpr52729.2023.00191 http://dx.doi.org/10.1109/cvpr52729.2023.00191
Qi T H , Fang S C , Wu Y Z , et al . DEADiff: An efficient stylization diffusion model with disentangled representations [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 8693 - 8702 . DOI: 10.1109/cvpr52733.2024.00830 http://dx.doi.org/10.1109/cvpr52733.2024.00830
Nichol A Q , Dhariwal P , Ramesh A , et al . GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models [C ] // International Conference on Machine Learning . PMLR , 2022 : 16784 - 16804 .
Yang Z Y , Zhang Y F , Li H F , et al . Instruction-driven fusion of Infrared-visible images: Tailoring for diverse downstream tasks [J ] . Information Fusion , 2025 , 121 : 103148 . DOI: 10.1016/j.inffus.2025.103148 http://dx.doi.org/10.1016/j.inffus.2025.103148
Tang L F , Yuan J T , Ma J Y . Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network [J ] . Information Fusion , 2022 , 82 : 28 - 42 . DOI: 10.1016/j.inffus.2021.12.004 http://dx.doi.org/10.1016/j.inffus.2021.12.004
Liu J Y , Liu Z , Wu G Y , et al . Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation [C ] // 2023 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2023 : 8081 - 8090 . DOI: 10.1109/iccv51070.2023.00745 http://dx.doi.org/10.1109/iccv51070.2023.00745
Rombach R , Blattmann A , Lorenz D , et al . High-resolution image synthesis with latent diffusion models [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 10674 - 10685 . DOI: 10.1109/cvpr52688.2022.01042 http://dx.doi.org/10.1109/cvpr52688.2022.01042
Wang Z , Bovik A C , Sheikh H R , et al . Image quality assessment: From error visibility to structural similarity [J ] . IEEE Transactions on Image Processing , 2004 , 13 ( 4 ): 600 - 612 . DOI: 10.1109/tip.2003.819861 http://dx.doi.org/10.1109/tip.2003.819861
Tang L F , Yuan J T , Zhang H , et al . PIAFusion: A progressive infrared and visible image fusion network based on illumination aware [J ] . Information Fusion , 2022 , 83 : 79 - 92 . DOI: 10.1016/j.inffus.2022.03.007 http://dx.doi.org/10.1016/j.inffus.2022.03.007
Jia X Y , Zhu C , Li M Z , et al . LLVIP: A visible-infrared paired dataset for low-light vision [C ] // 2021 IEEE/CVF International Conference on Computer Vision Workshops . Piscataway : IEEE , 2021 : 3489 - 3497 . DOI: 10.1109/iccvw54120.2021.00389 http://dx.doi.org/10.1109/iccvw54120.2021.00389
Huang Z B , Liu J Y , Fan X , et al . ReCoNet: Recurrent correction network for fast and efficient multi-modality image fusion [M ] // Computer Vision - ECCV 2022 . Cham : Springer International Publishing , 2022 : 539 - 555 . DOI: 10.1007/978-3-031-19797-0_31 http://dx.doi.org/10.1007/978-3-031-19797-0_31
Li H , Xu T Y , Wu X J , et al . LRRNet: A novel representation learning guided fusion network for infrared and visible images [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 9 ): 11040 - 11052 . DOI: 10.1109/tpami.2023.3268209 http://dx.doi.org/10.1109/tpami.2023.3268209
Zhao W D , Xie S G , Zhao F , et al . MetaFusion: Infrared and visible image fusion via meta-feature embedding from object detection [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 13955 - 13965 . DOI: 10.1109/cvpr52729.2023.01341 http://dx.doi.org/10.1109/cvpr52729.2023.01341
Zhao Z X , Bai H W , Zhang J S , et al . CDDFuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 5906 - 5916 . DOI: 10.1109/cvpr52729.2023.00572 http://dx.doi.org/10.1109/cvpr52729.2023.00572
Deng Y L , Xu T Y , Cheng C Y , et al . MMDRFuse: Distilled mini-model with dynamic refresh for multi-modality image fusion [C ] // Proceedings of the 32nd ACM International Conference on Multimedia . New York : ACM , 2024 : 7326 - 7335 . DOI: 10.1145/3664647.3681085 http://dx.doi.org/10.1145/3664647.3681085
Li H F , Yang Z Y , Zhang Y F , et al . MulFS-CAP: Multimodal fusion-supervised cross-modality alignment perception for unregistered infrared-visible image fusion [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2025 , 47 ( 5 ): 3673 - 3690 . DOI: 10.1109/tpami.2025.3535617 http://dx.doi.org/10.1109/tpami.2025.3535617
Aslantas V , Bendes E . A new image quality metric for image fusion: The sum of the correlations of differences [J ] . AEU - International Journal of Electronics and Communications , 2015 , 69 ( 12 ): 1890 - 1896 . DOI: 10.1016/j.aeue.2015.09.004 http://dx.doi.org/10.1016/j.aeue.2015.09.004
Han Y , Cai Y Z , Cao Y , et al . A new image fusion performance metric based on visual information fidelity [J ] . Information Fusion , 2013 , 14 ( 2 ): 127 - 135 . DOI: 10.1016/j.inffus.2011.08.002 http://dx.doi.org/10.1016/j.inffus.2011.08.002
Chen L C , Zhu Y K , Papandreou G , et al . Encoder-decoder with atrous separable convolution for semantic image segmentation [M ] // Computer Vision - ECCV 2018 . Cham : Springer International Publishing , 2018 : 833 - 851 . DOI: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49
0
Views
8
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621