内外特征交互与融合的双流注意力图像修复方法

黄光远; 黄荣; 周树波; 蒋学芹

doi:10.12263/DZXB.20240780

您当前的位置：

首页 >

文章列表页 >

内外特征交互与融合的双流注意力图像修复方法

学术论文 | 更新时间：2025-07-24

- 内外特征交互与融合的双流注意力图像修复方法
  增强出版
- Dual-Stream Attention Image Inpainting Method Based on Interacting and Fusing Internal-External Features
- 电子学报 2025年53卷第4期页码：1293-1307
- 作者机构：
  
  1.东华大学信息科学与技术学院，上海 201620
  2.东华大学数字化纺织服装技术教育部工程研究中心，上海 201620
- 作者简介：
  
  [ "黄光远男，2000年生，江苏徐州人.东华大学信息科学与技术学院硕士研究生.主要研究方向为图像修复.E-mail: gyhuang@mail.dhu.edu.cn" ]
  [ "黄荣男，1985年生，浙江绍兴人．东华大学信息科学与技术学院副教授.主要研究方向图像修复、语义分割等.E-mail: rong.huang@dhu.edu.cn" ]
  [ "周树波男，1988年生，浙江绍兴人．东华大学信息科学与技术学院助理研究员.主要研究方向为计算成像、工业视觉检测等.E-mail: zhoushubo@dhu.edu.cn" ]
  [ "蒋学芹男，1981年生，江苏苏州人. 东华大学信息科学与技术学院教授.主要研究方向为图信号处理、工业视觉检测等.E-mail: xqjiang@dhu.edu.cn" ]
- 基金信息：
  
  国家自然科学基金(62001099);中央高校基本科研业务费专项资金(2232023D-30)
- DOI：10.12263/DZXB.20240780
  中图分类号： TP391
- 收稿：2024-08-28，
  
  修回：2025-01-09，
  
  纸质出版：2025-04-25
- 稿件说明：
移动端阅览
黄光远, 黄荣, 周树波, 等. 内外特征交互与融合的双流注意力图像修复方法[J]. 电子学报, 2025, 53(04): 1293-1307.

HUANG Guang-yuan, HUANG Rong, ZHOU Shu-bo, et al. Dual-Stream Attention Image Inpainting Method Based on Interacting and Fusing Internal-External Features[J]. Acta Electronica Sinica, 2025, 53(04): 1293-1307.
黄光远, 黄荣, 周树波, 等. 内外特征交互与融合的双流注意力图像修复方法[J]. 电子学报, 2025, 53(04): 1293-1307. DOI：10.12263/DZXB.20240780

HUANG Guang-yuan, HUANG Rong, ZHOU Shu-bo, et al. Dual-Stream Attention Image Inpainting Method Based on Interacting and Fusing Internal-External Features[J]. Acta Electronica Sinica, 2025, 53(04): 1293-1307. DOI：10.12263/DZXB.20240780

摘要

注意力机制及其变体已广泛应用于基于深度学习的图像修复领域，它们将破损图像内部分为完好区域和缺失区域，捕获完好区域的远距离上下文信息以填充缺失区域.随着缺失区域增大，完好区域特征减少，限制了注意力机制的性能，从而导致修复效果不佳.为拓展注意力机制捕获上下文的范围，本文通过矢量量化码本学习视觉原子.这些视觉原子刻画了图像块的结构、纹理等特征，组成用于图像修复的外部特征，以弥补图像内部完好区域特征的不足.在此基础上，本文提出一种内外特征交互与融合的双流注意力图像修复方法.该方法结合内部和外部两个信息源，设计了内部掩码注意力和内外交叉注意力，组成双流注意力以实现内部特征之间以及内部和外部特征之间的交互，生成内外源修复特征.内部掩码注意力通过掩码屏蔽缺失区域特征的干扰，仅在完好区域捕获上下文信息，生成内源修复特征.内外交叉注意力通过计算内部特征与由视觉原子组成的外部特征之间的相似度关系，实现内外特征之间的交互，生成外源修复特征.此外，本文设计了可控特征融合模块，利用内外源修复特征之间的相关性生成空间权重图，为每个空间位置精确地筛选内外源修复特征，从而实现内部与外部特征的融合.在Places2、FFHQ和Paris StreetView三个公开的数据集上的实验结果表明本文方法在PSNR、SSIM、

1、LPIPS和FID指标上比其他先进方法平均提高了3.45%、1.34%、13.91%、13.64%和16.92%.消融实验结果和可视化实验结果表明图像内部特征与由视觉原子组成的外部特征均有益于修复破损图像.

Abstract

The attention mechanism and its variants have been widely applied in the field of image inpainting. They divide corrupted images into complete and missing regions

and capture long-range contextual information only within the complete regions to fill in the missing regions. As the area of missing regions increases

the features of complete regions decrease

which limits the performance of the attention mechanisms and leads to suboptimal inpainting results. In order to extend the context range of the attention mechanism

we employ a vector-quantized codebook to learn visual atoms. These visual atoms

which describe the structural and textural of image patches

constitute external features for image inpainting and thus compensate for the internal features of the image. On this basis

we propose a dual-stream attention image inpainting method based on interacting and fusing internal-external features. Based on internal and external information sources

we design an internal mask attention module

and an internal-external cross attention module. These two attention modules form a dual-stream attention to facilitate interaction within internal features and between internal and external features

thereby generating internal and external source inpainting features. The internal mask attention shields the interference of missing region features with a mask. It captures contextual information exclusively within the complete regions

thereby generating internal-source inpainting features. The internal-external cross attention interacts with internal and external features by calculating the similarity relationship between internal features and external features composed of visual atoms

thereby generating external-source inpainting features. In addition

we design a controllable feature fusion module that generates spatial weight maps based on the correlation between internal and external source inpainting features. These spatial weight maps fuse internal and external features by element-wise weighting of internal and external source inpainting features. Extensive experimental results on Places2

FFHQ and Paris StreetView datasets demonstrate that the proposed method achieves average improvements of 3.45%

1.34%

13.91%

13.64%

and 16.92% for PSNR

SSIM

LPIPS

and FID metrics respectively

compared with the state-of-the-art methods. Visualization experimental results demonstrate that both internal features and external features composed of visual atoms are beneficial for repairing corrupted images.

关键词

Keywords

references

ZENG Y , FU J , CHAO H , et al . Learning pyramid-context encoder network for high-quality image inpainting [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 1486 - 1494 .

ZUO Z , ZHAO L , LI A , et al . Generative image inpainting with segmentation confusion adversarial training and contrastive learning [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . Palo Alto : AAAI , 2023 : 3888 - 3896 .

JO Y , PARK J . Sc-fegan: Face editing generative adversarial network with user’s sketch and color [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2019 : 1745 - 1753 .

陈善雄 , 朱世宇 , 熊海灵 , 等 . 一种双判别器GAN的古彝文字符修复方法 [J ] . 自动化学报 , 2022 , 48 ( 3 ): 853 - 864 .

CHEN S X , ZHU S Y , XIONG H L , et al . A method of inpainting ancient yi characters based on dual discriminator generative adversarial networks [J ] . Acta Automatica Sinica , 2022 , 48 ( 3 ): 853 - 864 . (in Chinese)

赵磊 , 吉柏言 , 邢卫 , 等 . 基于多路编码器和双重注意力的古画修复算法 [J ] . 计算机研究与发展 , 2023 , 60 ( 12 ): 2814 - 2831 .

ZHAO L , JI B Y , XING W , et al . Ancient painting inpainting algorithm based on multi-channel encoder and dual attention [J ] . Journal of Computer Research and Development , 2023 , 60 ( 12 ): 2814 - 2831 . (in Chinese)

李建锋 , 廖胜辉 , 梅楚璇 . 基于Mean Shift和插值图像修复算法的CT图像金属伪影消除方法 [J ] . 电子学报 , 2017 , 45 ( 8 ): 1919 - 1924 .

LI J F , LIAO S H , MEI C X . A mean shift algorithm and interpolation image restoration algorithm based method for metal artifact reduction [J ] . Acta Electronica Sinica , 2017 , 45 ( 8 ): 1919 - 1924 . (in Chinese)

BERTALMIO M , SAPIRO G , CASELLES V , et al . Image inpainting [C ] // Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques . New York : ACM , 2000 : 417 - 424 .

RUŽIĆ T , PIŽURICA A . Context-aware patch-based image inpainting using Markov random field modeling [J ] . IEEE Transactions on Image Processing , 2015 , 24 ( 1 ): 444 - 456 .

GUO Q , GAO S , ZHANG X , et al . Patch-based image inpainting via two-stage low rank approximation [J ] . IEEE Transactions on Visualization and Computer Graphics , 2018 , 24 ( 6 ): 2023 - 2036 .

KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks [J ] . Communications of the ACM , 2017 , 60 ( 6 ): 84 - 90 .

GOODFELLOW I J , POUGET-ABADIE J , MIRZA M , et al . Generative adversarial nets [C ] // Proceedings of the 27th International Conference on Neural Information Processing Systems . Red Hook : Curran Associates , 2014 : 2672 - 2680 .

PATHAK D , KRAHENBUHL P , DONAHUE J , et al . Context encoders: Feature learning by inpainting [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2016 : 2536 - 2544 .

IIZUKA S , SIMO-SERRA E , ISHIKAWA H . Globally and locally consistent image completion [J ] . ACM Transactions on Graphics , 2017 , 36 ( 4 ): 1 - 14 .

GUO X , YANG H , HUANG D . Image inpainting via conditional texture and structure dual generation [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2021 : 14134 - 14143 .

刘微容 , 米彦春 , 杨帆 , 等 . 基于多级解码网络的图像修复 [J ] . 电子学报 , 2022 , 50 ( 3 ): 625 - 636 .

LIU W R , MI Y C , YANG F , et al . Generative image inpainting with multi-stage decoding network [J ] . Acta Electronica Sinica , 2022 , 50 ( 3 ): 625 - 636 . (in Chinese)

ZENG Y H , FU J L , CHAO H Y , et al . Aggregated contextual transformations for high-resolution image inpainting [J ] . IEEE Transactions on Visualization and Computer Graphics , 2023 , 29 ( 7 ): 3266 - 3280 .

罗会兰 , 敖阳 , 袁璞 . 一种生成对抗网络用于图像修复的方法 [J ] . 电子学报 , 2020 , 48 ( 10 ): 1891 - 1898 .

LUO H L , AO Y , YUAN P . Image inpainting using generative adversarial networks [J ] . Acta Electronica Sinica , 2020 , 48 ( 10 ): 1891 - 1898 . (in Chinese)

LI X , GUO Q , LIN D , et al . MISF: Multi-level interactive siamese filtering for high-fidelity image inpainting [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 1869 - 1878 .

曹承瑞 , 刘微容 , 史长宏 , 等 . 多级注意力传播驱动的生成式图像修复方法 [J ] . 自动化学报 , 2022 , 48 ( 5 ): 1343 - 1352 .

CAO C R , LIU W R , SHI C H , et al . Generative image inpainting with attention propagation [J ] . Acta Automatica Sinica , 2022 , 48 ( 5 ): 1343 - 1352 . (in Chinese)

王山豹 , 梁栋 , 沈玲 . 利用多模态注意力机制生成网络的图像修复 [J ] . 计算机辅助设计与图形学学报 , 2023 , 35 ( 7 ): 1109 - 1121 .

WANG S B , LIANG D , SHEN L . Image inpainting with multi-modal attention mechanism generative networks [J ] . Journal of Computer-Aided Design & Computer Graphics , 2023 , 35 ( 7 ): 1109 - 1121 . (in Chinese)

DENG Y , HUI S , ZHOU S , et al . Context adaptive network for image inpainting [J ] . IEEE Transactions on Image Processing , 2023 , 32 : 6332 - 6345 .

LIU G , REDA F A , SHIH K J , et al . Image inpainting for irregular holes using partial convolutions [C ] // Proceedings of the European Conference on Computer Vision . Cham : Springer , 2018 : 85 - 100 .

LI W , LIN Z , ZHOU K , et al . MAT: Mask-aware transformer for large hole image inpainting [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 10758 - 10768 .

LIU Q , TAN Z , CHEN D , et al . Reduce information loss in transformers for pluralistic image inpainting [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 11347 - 11357 .

YU J , LIN Z , YANG J , et al . Generative image inpainting with contextual attention [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 5505 - 5514 .

LIU H , JIANG B , XIAO Y , et al . Coherent semantic attention for image inpainting [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2019 : 4170 - 4179 .

YU J , LIN Z , YANG J , et al . Free-form image inpainting with gated convolution [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2019 : 4471 - 4480 .

MA Y , LIU X , BAI S , et al . Coarse-to-fine image inpainting via region-wise convolutions and non-local correlation [C ] // Proceedings of the International Joint Conference on Artificial Intelligence . San Francisco : Morgan Kaufmann , 2019 : 3123 - 3129 .

WANG N , LI J , ZHANG L , et al . MUSICAL: Multi-scale image contextual attention learning for inpainting [C ] // Proceedings of the International Joint Conference on Artificial Intelligence . San Francisco : Morgan Kaufmann , 2019 : 3748 - 3754 .

ESSER P , ROMBACH R , OMMER B . Taming transformers for high-resolution image synthesis [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2021 : 12873 - 12883 .

ZHOU S , CHAN K , LI C , et al . Towards robust blind face restoration with codebook lookup transformer [C ] // Proceedings of the 36st International Conference on Neural Information Processing Systems . Red Hook : Curran Associates , 2022 : 30599 - 30611 .

SHEN J , CHAN T F . Mathematical models for local nontexture inpaintings [J ] . SIAM Journal on Applied Mathematics , 2002 , 62 ( 3 ): 1019 - 1043 .

CHAN T F , SHEN J . Nontexture inpainting by curvature-driven diffusions [J ] . Journal of Visual Communication and Image Representation , 2001 , 12 ( 4 ): 436 - 449 .

CRIMINISI A , PÉREZ P , TOYAMA K . Region filling and object removal by exemplar-based image inpainting [J ] . IEEE Transactions on Image Processing 2004 , 13 ( 9 ): 1200 - 1212 .

YU F , KOLTUN V . Multi-scale context aggregation by dilated convolutions [C ] // Proceedings of the International Conference on Learning Representations . Washington : ICLR , 2016 : 1 - 16 .

LIU Z , LIN Y , CAO Y , et al . Swin transformer: Hierarchical vision transformer using shifted windows [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2021 : 10012 - 10022 .

VAN DEN OORD A , VINYALS O . Neural discrete representation learning [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . Red Hook : Curran Associates , 2017 : 6309 - 6318 .

JOHNSON J , ALAHI A , FEI-FEI L . Perceptual losses for real-time style transfer and super-resolution [C ] // Proceedings of the European Conference on Computer Vision . Cham : Springer , 2016 : 694 - 711 .

GU Y , WANG X , XIE L , et al . VQFR: Blind face restoration with vector-quantized dictionary and parallel decoder [C ] // Proceedings of the European Conference on Computer Vision Cham: Springer , 2022 : 126 - 143 .

WANG Z , ZHANG J , CHEN R , et al . Restoreformer: High-quality blind face restoration from undegraded key-value pairs [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 17512 - 17521 .

ZOU W , GAO H , YE T , et al . VQCNIR: Clearer night image restoration with vector-quantized codebook [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . Palo Alto : AAAI , 2024 : 7873 - 7881 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st Conference on Neural Information Processing Systems . New York : ACM , 2017 : 6000 - 6010 .

ZHU X , HU H , LIN S , et al . Deformable convnets v2: More deformable, better results [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 9308 - 9316 .

GATYS L A , ECKER A S , BETHGE M . Image style transfer using convolutional neural networks [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2016 : 2414 - 2423 .

DENG J , DONG W , SOCHER R , et al . ImageNet: A large-scale hierarchical image database [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 248 - 255 .

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [C ] // Proceedings of the International Conference on Learning Representations . Washington : ICLR , 2015 : 1 - 13 .

SAJJADI M S , SCHOLKOPF B , HIRSCH M . Enhancenet: Single image super-resolution through automated texture synthesis [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2017 : 4491 - 4500 .

MIYATO T , KATAOKA T , KOYAMA M , et al . Spectral normalization for generative adversarial networks [C ] // Proceedings of the International Conference on Learning Representations . Washington : ICLR , 2018 : 1 - 13 .

ZHU J Y , PARK T , ISOLA P , et al . Unpaired image-to-image translation using cycle-consistent adversarial networks [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2017 : 2223 - 2232 .

KINGMA D P , BA J . Adam: A method for stochastic optimization [C ] // Proceedings of the International Conference on Learning Representations . Washington : ICLR , 2014 : 58 - 64 .

NAZERI K , NG E , JOSEPH T , et al . EdgeConnect: Structure Guided Image Inpainting using Edge Prediction [C ] // Proceedings of the IEEE International Conference on Computer Vision Workshops . Piscataway : IEEE , 2019 : 3265 - 3274 .

ZHOU B , LAPEDRIZA A , KHOSLA A , et al . Places: A 10 million image database for scene recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 40 ( 6 ): 1452 - 1464 .

KARRAS T , LAINE S , AILA T . A style-based generator architecture for generative adversarial networks [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 4401 - 4410 .

DOERSCH C , SINGH S , GUPTA A , et al . What makes Paris look like Paris? [J ] . ACM Transactions on Graphics , 2012 , 31 ( 4 ): 1 - 9 .

HEUSEL M , RAMSAUER H , UNTERTHINER T , et al . Gans trained by a two time-scale update rule converge to a local nash equilibrium [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . Red Hook : Curran Associates , 2017 : 6629 - 6640 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

面向时序异常检测的可变视距多向扫描方法

结合特征融合增强和细节特征的低照度小目标检测方法

基于EIMYOLO的高分遥感图像目标检测

基于实时语义链表构建系统的改善定位研究