1.东华大学信息科学与技术学院,上海 201620
2.东华大学数字化纺织服装技术教育部工程研究中心,上海 201620
[ "黄光远 男,2000年生,江苏徐州人.东华大学信息科学与技术学院硕士研究生.主要研究方向为图像修复.E-mail: gyhuang@mail.dhu.edu.cn" ]
[ "黄荣 男,1985年生,浙江绍兴人.东华大学信息科学与技术学院副教授.主要研究方向图像修复、语义分割等.E-mail: rong.huang@dhu.edu.cn" ]
[ "周树波 男,1988年生,浙江绍兴人.东华大学信息科学与技术学院助理研究员.主要研究方向为计算成像、工业视觉检测等.E-mail: zhoushubo@dhu.edu.cn" ]
[ "蒋学芹 男,1981年生,江苏苏州人. 东华大学信息科学与技术学院教授.主要研究方向为图信号处理、工业视觉检测等.E-mail: xqjiang@dhu.edu.cn" ]
收稿:2024-08-28,
修回:2025-01-09,
纸质出版:2025-04-25
移动端阅览
黄光远, 黄荣, 周树波, 等. 内外特征交互与融合的双流注意力图像修复方法[J]. 电子学报, 2025, 53(04): 1293-1307.
HUANG Guang-yuan, HUANG Rong, ZHOU Shu-bo, et al. Dual-Stream Attention Image Inpainting Method Based on Interacting and Fusing Internal-External Features[J]. Acta Electronica Sinica, 2025, 53(04): 1293-1307.
黄光远, 黄荣, 周树波, 等. 内外特征交互与融合的双流注意力图像修复方法[J]. 电子学报, 2025, 53(04): 1293-1307. DOI:10.12263/DZXB.20240780
HUANG Guang-yuan, HUANG Rong, ZHOU Shu-bo, et al. Dual-Stream Attention Image Inpainting Method Based on Interacting and Fusing Internal-External Features[J]. Acta Electronica Sinica, 2025, 53(04): 1293-1307. DOI:10.12263/DZXB.20240780
注意力机制及其变体已广泛应用于基于深度学习的图像修复领域,它们将破损图像内部分为完好区域和缺失区域,捕获完好区域的远距离上下文信息以填充缺失区域.随着缺失区域增大,完好区域特征减少,限制了注意力机制的性能,从而导致修复效果不佳.为拓展注意力机制捕获上下文的范围,本文通过矢量量化码本学习视觉原子.这些视觉原子刻画了图像块的结构、纹理等特征,组成用于图像修复的外部特征,以弥补图像内部完好区域特征的不足.在此基础上,本文提出一种内外特征交互与融合的双流注意力图像修复方法.该方法结合内部和外部两个信息源,设计了内部掩码注意力和内外交叉注意力,组成双流注意力以实现内部特征之间以及内部和外部特征之间的交互,生成内外源修复特征.内部掩码注意力通过掩码屏蔽缺失区域特征的干扰,仅在完好区域捕获上下文信息,生成内源修复特征.内外交叉注意力通过计算内部特征与由视觉原子组成的外部特征之间的相似度关系,实现内外特征之间的交互,生成外源修复特征.此外,本文设计了可控特征融合模块,利用内外源修复特征之间的相关性生成空间权重图,为每个空间位置精确地筛选内外源修复特征,从而实现内部与外部特征的融合.在Places2、FFHQ和Paris StreetView三个公开的数据集上的实验结果表明本文方法在PSNR、SSIM、
L
1、LPIPS和FID指标上比其他先进方法平均提高了3.45%、1.34%、13.91%、13.64%和16.92%.消融实验结果和可视化实验结果表明图像内部特征与由视觉原子组成的外部特征均有益于修复破损图像.
The attention mechanism and its variants have been widely applied in the field of image inpainting. They divide corrupted images into complete and missing regions
and capture long-range contextual information only within the complete regions to fill in the missing regions. As the area of missing regions increases
the features of complete regions decrease
which limits the performance of the attention mechanisms and leads to suboptimal inpainting results. In order to extend the context range of the attention mechanism
we employ a vector-quantized codebook to learn visual atoms. These visual atoms
which describe the structural and textural of image patches
constitute external features for image inpainting and thus compensate for the internal features of the image. On this basis
we propose a dual-stream attention image inpainting method based on interacting and fusing internal-external features. Based on internal and external information sources
we design an internal mask attention module
and an internal-external cross attention module. These two attention modules form a dual-stream attention to facilitate interaction within internal features and between internal and external features
thereby generating internal and external source inpainting features. The internal mask attention shields the interference of missing region features with a mask. It captures contextual information exclusively within the complete regions
thereby generating internal-source inpainting features. The internal-external cross attention interacts with internal and external features by calculating the similarity relationship between internal features and external features composed of visual atoms
thereby generating external-source inpainting features. In addition
we design a controllable feature fusion module that generates spatial weight maps based on the correlation between internal and external source inpainting features. These spatial weight maps fuse internal and external features by element-wise weighting of internal and external source inpainting features. Extensive experimental results on Places2
FFHQ and Paris StreetView datasets demonstrate that the proposed method achieves average improvements of 3.45%
1.34%
13.91%
13.64%
and 16.92% for PSNR
SSIM
L
1
LPIPS
and FID metrics respectively
compared with the state-of-the-art methods. Visualization experimental results demonstrate that both internal features and external features composed of visual atoms are beneficial for repairing corrupted images.
ZENG Y , FU J , CHAO H , et al . Learning pyramid-context encoder network for high-quality image inpainting [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 1486 - 1494 .
ZUO Z , ZHAO L , LI A , et al . Generative image inpainting with segmentation confusion adversarial training and contrastive learning [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . Palo Alto : AAAI , 2023 : 3888 - 3896 .
JO Y , PARK J . Sc-fegan: Face editing generative adversarial network with user’s sketch and color [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2019 : 1745 - 1753 .
陈善雄 , 朱世宇 , 熊海灵 , 等 . 一种双判别器GAN的古彝文字符修复方法 [J ] . 自动化学报 , 2022 , 48 ( 3 ): 853 - 864 .
CHEN S X , ZHU S Y , XIONG H L , et al . A method of inpainting ancient yi characters based on dual discriminator generative adversarial networks [J ] . Acta Automatica Sinica , 2022 , 48 ( 3 ): 853 - 864 . (in Chinese)
赵磊 , 吉柏言 , 邢卫 , 等 . 基于多路编码器和双重注意力的古画修复算法 [J ] . 计算机研究与发展 , 2023 , 60 ( 12 ): 2814 - 2831 .
ZHAO L , JI B Y , XING W , et al . Ancient painting inpainting algorithm based on multi-channel encoder and dual attention [J ] . Journal of Computer Research and Development , 2023 , 60 ( 12 ): 2814 - 2831 . (in Chinese)
李建锋 , 廖胜辉 , 梅楚璇 . 基于Mean Shift和插值图像修复算法的CT图像金属伪影消除方法 [J ] . 电子学报 , 2017 , 45 ( 8 ): 1919 - 1924 .
LI J F , LIAO S H , MEI C X . A mean shift algorithm and interpolation image restoration algorithm based method for metal artifact reduction [J ] . Acta Electronica Sinica , 2017 , 45 ( 8 ): 1919 - 1924 . (in Chinese)
BERTALMIO M , SAPIRO G , CASELLES V , et al . Image inpainting [C ] // Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques . New York : ACM , 2000 : 417 - 424 .
RUŽIĆ T , PIŽURICA A . Context-aware patch-based image inpainting using Markov random field modeling [J ] . IEEE Transactions on Image Processing , 2015 , 24 ( 1 ): 444 - 456 .
GUO Q , GAO S , ZHANG X , et al . Patch-based image inpainting via two-stage low rank approximation [J ] . IEEE Transactions on Visualization and Computer Graphics , 2018 , 24 ( 6 ): 2023 - 2036 .
KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks [J ] . Communications of the ACM , 2017 , 60 ( 6 ): 84 - 90 .
GOODFELLOW I J , POUGET-ABADIE J , MIRZA M , et al . Generative adversarial nets [C ] // Proceedings of the 27th International Conference on Neural Information Processing Systems . Red Hook : Curran Associates , 2014 : 2672 - 2680 .
PATHAK D , KRAHENBUHL P , DONAHUE J , et al . Context encoders: Feature learning by inpainting [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2016 : 2536 - 2544 .
IIZUKA S , SIMO-SERRA E , ISHIKAWA H . Globally and locally consistent image completion [J ] . ACM Transactions on Graphics , 2017 , 36 ( 4 ): 1 - 14 .
GUO X , YANG H , HUANG D . Image inpainting via conditional texture and structure dual generation [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2021 : 14134 - 14143 .
刘微容 , 米彦春 , 杨帆 , 等 . 基于多级解码网络的图像修复 [J ] . 电子学报 , 2022 , 50 ( 3 ): 625 - 636 .
LIU W R , MI Y C , YANG F , et al . Generative image inpainting with multi-stage decoding network [J ] . Acta Electronica Sinica , 2022 , 50 ( 3 ): 625 - 636 . (in Chinese)
ZENG Y H , FU J L , CHAO H Y , et al . Aggregated contextual transformations for high-resolution image inpainting [J ] . IEEE Transactions on Visualization and Computer Graphics , 2023 , 29 ( 7 ): 3266 - 3280 .
罗会兰 , 敖阳 , 袁璞 . 一种生成对抗网络用于图像修复的方法 [J ] . 电子学报 , 2020 , 48 ( 10 ): 1891 - 1898 .
LUO H L , AO Y , YUAN P . Image inpainting using generative adversarial networks [J ] . Acta Electronica Sinica , 2020 , 48 ( 10 ): 1891 - 1898 . (in Chinese)
LI X , GUO Q , LIN D , et al . MISF: Multi-level interactive siamese filtering for high-fidelity image inpainting [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 1869 - 1878 .
曹承瑞 , 刘微容 , 史长宏 , 等 . 多级注意力传播驱动的生成式图像修复方法 [J ] . 自动化学报 , 2022 , 48 ( 5 ): 1343 - 1352 .
CAO C R , LIU W R , SHI C H , et al . Generative image inpainting with attention propagation [J ] . Acta Automatica Sinica , 2022 , 48 ( 5 ): 1343 - 1352 . (in Chinese)
王山豹 , 梁栋 , 沈玲 . 利用多模态注意力机制生成网络的图像修复 [J ] . 计算机辅助设计与图形学学报 , 2023 , 35 ( 7 ): 1109 - 1121 .
WANG S B , LIANG D , SHEN L . Image inpainting with multi-modal attention mechanism generative networks [J ] . Journal of Computer-Aided Design & Computer Graphics , 2023 , 35 ( 7 ): 1109 - 1121 . (in Chinese)
DENG Y , HUI S , ZHOU S , et al . Context adaptive network for image inpainting [J ] . IEEE Transactions on Image Processing , 2023 , 32 : 6332 - 6345 .
LIU G , REDA F A , SHIH K J , et al . Image inpainting for irregular holes using partial convolutions [C ] // Proceedings of the European Conference on Computer Vision . Cham : Springer , 2018 : 85 - 100 .
LI W , LIN Z , ZHOU K , et al . MAT: Mask-aware transformer for large hole image inpainting [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 10758 - 10768 .
LIU Q , TAN Z , CHEN D , et al . Reduce information loss in transformers for pluralistic image inpainting [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 11347 - 11357 .
YU J , LIN Z , YANG J , et al . Generative image inpainting with contextual attention [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 5505 - 5514 .
LIU H , JIANG B , XIAO Y , et al . Coherent semantic attention for image inpainting [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2019 : 4170 - 4179 .
YU J , LIN Z , YANG J , et al . Free-form image inpainting with gated convolution [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2019 : 4471 - 4480 .
MA Y , LIU X , BAI S , et al . Coarse-to-fine image inpainting via region-wise convolutions and non-local correlation [C ] // Proceedings of the International Joint Conference on Artificial Intelligence . San Francisco : Morgan Kaufmann , 2019 : 3123 - 3129 .
WANG N , LI J , ZHANG L , et al . MUSICAL: Multi-scale image contextual attention learning for inpainting [C ] // Proceedings of the International Joint Conference on Artificial Intelligence . San Francisco : Morgan Kaufmann , 2019 : 3748 - 3754 .
ESSER P , ROMBACH R , OMMER B . Taming transformers for high-resolution image synthesis [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2021 : 12873 - 12883 .
ZHOU S , CHAN K , LI C , et al . Towards robust blind face restoration with codebook lookup transformer [C ] // Proceedings of the 36st International Conference on Neural Information Processing Systems . Red Hook : Curran Associates , 2022 : 30599 - 30611 .
SHEN J , CHAN T F . Mathematical models for local nontexture inpaintings [J ] . SIAM Journal on Applied Mathematics , 2002 , 62 ( 3 ): 1019 - 1043 .
CHAN T F , SHEN J . Nontexture inpainting by curvature-driven diffusions [J ] . Journal of Visual Communication and Image Representation , 2001 , 12 ( 4 ): 436 - 449 .
CRIMINISI A , PÉREZ P , TOYAMA K . Region filling and object removal by exemplar-based image inpainting [J ] . IEEE Transactions on Image Processing 2004 , 13 ( 9 ): 1200 - 1212 .
YU F , KOLTUN V . Multi-scale context aggregation by dilated convolutions [C ] // Proceedings of the International Conference on Learning Representations . Washington : ICLR , 2016 : 1 - 16 .
LIU Z , LIN Y , CAO Y , et al . Swin transformer: Hierarchical vision transformer using shifted windows [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2021 : 10012 - 10022 .
VAN DEN OORD A , VINYALS O . Neural discrete representation learning [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . Red Hook : Curran Associates , 2017 : 6309 - 6318 .
JOHNSON J , ALAHI A , FEI-FEI L . Perceptual losses for real-time style transfer and super-resolution [C ] // Proceedings of the European Conference on Computer Vision . Cham : Springer , 2016 : 694 - 711 .
GU Y , WANG X , XIE L , et al . VQFR: Blind face restoration with vector-quantized dictionary and parallel decoder [C ] // Proceedings of the European Conference on Computer Vision Cham: Springer , 2022 : 126 - 143 .
WANG Z , ZHANG J , CHEN R , et al . Restoreformer: High-quality blind face restoration from undegraded key-value pairs [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 17512 - 17521 .
ZOU W , GAO H , YE T , et al . VQCNIR: Clearer night image restoration with vector-quantized codebook [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . Palo Alto : AAAI , 2024 : 7873 - 7881 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st Conference on Neural Information Processing Systems . New York : ACM , 2017 : 6000 - 6010 .
ZHU X , HU H , LIN S , et al . Deformable convnets v2: More deformable, better results [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 9308 - 9316 .
GATYS L A , ECKER A S , BETHGE M . Image style transfer using convolutional neural networks [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2016 : 2414 - 2423 .
DENG J , DONG W , SOCHER R , et al . ImageNet: A large-scale hierarchical image database [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 248 - 255 .
SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [C ] // Proceedings of the International Conference on Learning Representations . Washington : ICLR , 2015 : 1 - 13 .
SAJJADI M S , SCHOLKOPF B , HIRSCH M . Enhancenet: Single image super-resolution through automated texture synthesis [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2017 : 4491 - 4500 .
MIYATO T , KATAOKA T , KOYAMA M , et al . Spectral normalization for generative adversarial networks [C ] // Proceedings of the International Conference on Learning Representations . Washington : ICLR , 2018 : 1 - 13 .
ZHU J Y , PARK T , ISOLA P , et al . Unpaired image-to-image translation using cycle-consistent adversarial networks [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway : IEEE , 2017 : 2223 - 2232 .
KINGMA D P , BA J . Adam: A method for stochastic optimization [C ] // Proceedings of the International Conference on Learning Representations . Washington : ICLR , 2014 : 58 - 64 .
NAZERI K , NG E , JOSEPH T , et al . EdgeConnect: Structure Guided Image Inpainting using Edge Prediction [C ] // Proceedings of the IEEE International Conference on Computer Vision Workshops . Piscataway : IEEE , 2019 : 3265 - 3274 .
ZHOU B , LAPEDRIZA A , KHOSLA A , et al . Places: A 10 million image database for scene recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 40 ( 6 ): 1452 - 1464 .
KARRAS T , LAINE S , AILA T . A style-based generator architecture for generative adversarial networks [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 4401 - 4410 .
DOERSCH C , SINGH S , GUPTA A , et al . What makes Paris look like Paris? [J ] . ACM Transactions on Graphics , 2012 , 31 ( 4 ): 1 - 9 .
HEUSEL M , RAMSAUER H , UNTERTHINER T , et al . Gans trained by a two time-scale update rule converge to a local nash equilibrium [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . Red Hook : Curran Associates , 2017 : 6629 - 6640 .
0
浏览量
15
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621