基于神经网络的图像风格迁移算法综述

王伟; 张静宜; 温玉辉; 魏云超

doi:10.12263/DZXB.20240930

您当前的位置：

首页 >

文章列表页 >

基于神经网络的图像风格迁移算法综述

综述评论 | 更新时间：2025-08-18

- 基于神经网络的图像风格迁移算法综述
- Neural Network Based Image Style Transfer: A Survey
- 电子学报 2025年53卷第5期页码：1692-1712
- 作者机构：
  
  北京交通大学计算科学与技术学院，北京 100044
- 作者简介：
  
  [ "王伟男，1990年生.博士，北京交通大学计算机科学与技术学院教授.主要研究方向为计算机视觉、机器学习.中国电子学会会员编号：E190029917M.E-mail: wei.wang@bjtu.edu.cn" ]
  [ "张静宜女，2001年生.北京交通大学计算机科学与技术学院硕士研究生.主要研究方向为计算机视觉.E-mail: 24120305@bjtu.edu.cn" ]
  [ "温玉辉女，1990年生.博士，北京交通大学计算机科学与技术学院副教授.主要研究方向为计算机视觉、计算机图形学、机器学习.E-mail: yhwen1@bjtu.edu.cn" ]
  [ "魏云超男，1986年生.博士，北京交通大学计算机科学与技术学院教授.主要研究方向为计算机视觉、机器学习.E-mail: yunchao.wei@bjtu.edu.cn" ]
- 基金信息：
  
  中央高校基本科研基金(2022XKRC015);国家自然科学基金(62372033)
- DOI：10.12263/DZXB.20240930
  中图分类号： TP183;
- 收稿：2024-10-16，
  
  修回：2025-04-21，
  
  纸质出版：2025-05-25
- 稿件说明：
移动端阅览
王伟, 张静宜, 温玉辉, 等. 基于神经网络的图像风格迁移算法综述[J]. 电子学报, 2025, 53(05): 1692-1712.

WANG Wei, ZHANG Jing-yi, WEN Yu-hui, et al. Neural Network Based Image Style Transfer: A Survey[J]. Acta Electronica Sinica, 2025, 53(05): 1692-1712.
王伟, 张静宜, 温玉辉, 等. 基于神经网络的图像风格迁移算法综述[J]. 电子学报, 2025, 53(05): 1692-1712. DOI：10.12263/DZXB.20240930

WANG Wei, ZHANG Jing-yi, WEN Yu-hui, et al. Neural Network Based Image Style Transfer: A Survey[J]. Acta Electronica Sinica, 2025, 53(05): 1692-1712. DOI：10.12263/DZXB.20240930

摘要

风格迁移作为图像编辑领域的一个关键研究方向，在艺术创作等领域展现出广泛的应用前景.自Gatys等人提出使用深度卷积特征间相关性捕获纹理信息并基于此实现风格迁移后，大量基于神经网络的风格迁移算法不断涌现.近年来随着各式生成模型的兴起，将生成对抗网络、扩散模型等生成模型引入风格迁移工作获得了新的关注.此外，图像-文本跨模态任务的突破使得文本引导条件下的图像风格迁移成为可能.本文对当前先进的研究方法进行分类和描述.具体地，依据引导条件差异，将现有方法划分为图像引导的图像风格迁移方法、文本引导的图像风格迁移方法；依据网络架构的不同，将现有方法细分为基于自编码器的方法、基于生成对抗网络的方法、基于扩散模型的方法以及基于其他模型架构的方法，对当前图像风格迁移技术的研究进行全面的综述与分析.随后，介绍了图像风格迁移任务的数据集和评价体系，并从定量与定性方面对部分最先进的图像风格迁移方法进行实验和比较.最后，讨论了当前图像风格迁移技术面临的挑战，并对未来研究方向提出了展望.

Abstract

As a key research direction in the field of image editing

style transfer has shown a broad applications in artistic creation and related fields. Since Gatys et al. proposed the use of deep convolutional inter-feature correlations to capture texture information for style transfer

numerous neural style transfer algorithms have emerged. Recently

with the rise of various generative models

particularly the introduction of generative adversarial networks and diffusion models

style transfer work has gained new attention. Additionally

breakthroughs in image-text cross-modal tasks have made text-guided image style transfer possible. This paper presents a comprehensive review of the latest developments in style transfer techniques

classifying methods into image-guided and text-guided categories based on the guiding conditions. Furthermore

the methods are categorized into autoencoder-based approaches

GAN-based methods

diffusion model-based methods

and other architectural variants. This paper also introduces relevant dataset and evaluation metrics for image style transfer tasks

and compares state-of-the-art methods in terms of quantitative and qualitative aspects. Finally

the paper discusses the challenges and g provides insights into potential future research directions.

关键词

Keywords

references

李宝奇 , 黄海宁 , 刘纪元 , 等 . 基于改进CycleGAN的光学图像迁移生成水下小目标合成孔径声纳图像算法研究 [J ] . 电子学报 , 2021 , 49 ( 9 ): 1746 - 1753 .

LI B Q , HUANG H N , LIU J Y , et al . Optical image-to-underwater small target synthetic aperture sonar image translation algorithm based on improved CycleGAN [J ] . Acta Electronica Sinica , 2021 , 49 ( 9 ): 1746 - 1753 . (in Chinese)

杨曦 , 张鑫 , 郭浩远 , 等 . 基于不变特征的多源遥感图像舰船目标检测算法 [J ] . 电子学报 , 2022 , 50 ( 4 ): 887 - 899 .

YANG X , ZHANG X , GUO H Y , et al . Invariant features based ship detection model for multi-source remote sensing images [J ] . Acta Electronica Sinica , 2022 , 50 ( 4 ): 887 - 899 . (in Chinese)

LI Y M , ZHANG D , KEUPER M , et al . Intra-source style augmentation for improved domain generalization [C ] // 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2023 : 509 - 519 .

JIA Y R , HOYER L , HUANG S Y , et al . DGInStyle: Domain-generalizable semantic segmentation with image diffusion models and stylized semantic control [C ] // Computer Vision-ECCV 2024 . Cham : Springer Nature Switzerland , 2024 : 91 - 109 .

LI H L , LI W , CAO H , et al . Unsupervised domain adaptation for face anti-spoofing [J ] . IEEE Transactions on Information Forensics and Security , 2018 , 13 ( 7 ): 1794 - 1809 .

李大锦 , 高文冉 , 高俊杰 . 基于kuwahara滤波的视频风格化框架 [J ] . 电子学报 , 2020 , 48 ( 3 ): 538 - 544 .

LI D J , GAO W R , GAO J J . Artistic video stylization based on kuwahara filter [J ] . Acta Electronica Sinica , 2020 , 48 ( 3 ): 538 - 544 . (in Chinese)

HERTZMANN A , JACOBS C E , OLIVER N , et al . Image analogies [M ] // Seminal Graphics Papers: Pushing the Boundaries , Volume 2 . New York : ACM , 2023: 557 - 570 .

WANG N N , TAO D C , GAO X B , et al . Transductive face sketch-photo synthesis [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2013 , 24 ( 9 ): 1364 - 1376 .

GATYS L A , ECKER A S , BETHGE M , et al . Texture synthesis using convolutional neural networks [C ] // Advances in Neural Information Processing Systems . Washington : American Chemical Society , 2015 : 262 - 270 .

GATYS L A , ECKER A S , BETHGE M . Image style transfer using convolutional neural networks [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 2414 - 2423 .

RADFORD A , KIM J W , HALLACY C , et al . Learning transferable visual models from natural language supervision [C ] // International Conference on Machine Learning . New York : PMLR , 2021 : 8748 - 8763 .

PATASHNIK O , WU Z Z , SHECHTMAN E , et al . StyleCLIP: Text-driven manipulation of StyleGAN imagery [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 2065 - 2074 .

GAL R , PATASHNIK O , MARON H , et al . StyleGAN-NADA [J ] . ACM Transactions on Graphics , 2022 , 41 ( 4 ): 1 - 13 .

KWON G , YE J C . CLIPstyler: Image style transfer with a single text condition [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 18041 - 18050 .

HUANG N S , ZHANG Y X , TANG F , et al . DiffStyler: Controllable dual diffusion for text-driven image stylization [EB/OL ] . ( 2023-12-18 )[ 2025-04-08 ] . https://arxiv.org/abs/2211.10682v2 https://arxiv.org/abs/2211.10682v2 .

CHEN D Y , TENNENT H , HSU C W . ArtAdapter: Text-to-image style transfer using multi-level style encoder and explicit adaptation [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2024 : 8619 - 8628 .

WANG H F , SPINELLI M , WANG Q X , et al . InstantStyle: Free lunch towards style-preserving in text-to-image generation [EB/OL ] . ( 2024-04-04 )[ 2025-04-08 ] . https://arxiv.org/abs/2404.02733v2 https://arxiv.org/abs/2404.02733v2 .

PENG D , HU P , KE Q H , et al . Diffusion-based image translation with label guidance for domain adaptive semantic segmentation [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 808 - 820 .

QI T H , FANG S C , WU Y Z , et al . DEADiff: An efficient stylization diffusion model with disentangled representations [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2024 : 8693 - 8702 .

JING Y C , YANG Y Z , FENG Z L , et al . Neural style transfer: A review [J ] . IEEE Transactions on Visualization and Computer Graphics , 2020 , 26 ( 11 ): 3365 - 3385 .

CAI Q , MA M X , WANG C , et al . Image neural style transfer: A review [J ] . Computers and Electrical Engineering , 2023 , 108 : 108723 .

GOODFELLOW I J , POUGET-ABADIE J , MIRZA M , et al . Generative adversarial networks [EB/OL ] . ( 2014-06-10 )[ 2025-04-08 ] . https://arxiv.org/abs/1406.2661v1 https://arxiv.org/abs/1406.2661v1 .

HO J , JAIN A , ABBEEL P . Denoising diffusion probabilistic models [C ] // Advances in neural information processing systems . New York : NIPS , 2020 : 6840 - 6851 .

LI C , WAND M . Combining Markov random fields and convolutional neural networks for image synthesis [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 2479 - 2486 .

HUANG X , BELONGIE S . Arbitrary style transfer in real-time with adaptive instance normalization [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 1510 - 1519 .

LI Y J , FANG C , YANG J M , et al . Universal style transfer via feature transforms [EB/OL ] . ( 2017-11-17 )[ 2025-04-08 ] . https://arxiv.org/abs/1705.08086v2 https://arxiv.org/abs/1705.08086v2 .

LUAN F J , PARIS S , SHECHTMAN E , et al . Deep photo style transfer [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 6997 - 7005 .

LI Y J , LIU M Y , LI X T , et al . A closed-form solution to photorealistic image stylization [C ] // Computer Vision-ECCV 2018 . Cham : Springer International Publishing , 2018 : 468 - 483 .

YOO J , UH Y , CHUN S , et al . Photorealistic style transfer via wavelet transforms [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 9035 - 9044 .

CAO K D , LIAO J , YUAN L . CariGANs: Unpaired photo-to-caricature translation [EB/OL ] . ( 2018-11-02 )[ 2025-04-08 ] . https://arxiv.org/abs/1811.00222v2 https://arxiv.org/abs/1811.00222v2 .

SHI Y C , DEB D , JAIN A K . WarpGAN: Automatic caricature generation [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 10754 - 10763 .

SANAKOYEU A , KOTOVENKO D , LANG S , et al . A style-aware content loss for real-time HD style transfer [C ] // Computer Vision-CCV 2018 . Cham : Springer International Publishing , 2018 : 715 - 731 .

KOTOVENKO D , SANAKOYEU A , MA P C , et al . A content transformation block for image style transfer [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 10032 - 10041 .

CHEN H B , ZHAO L , WANG Z Z , et al . Artistic style transfer with internal-external learning and contrastive learning [C ] // Advances in Neural Information Processing Systems . San Diego : NIPS , 2021 : 26561 - 26573 .

KARRAS T , LAINE S , AILA T M . A style-based generator architecture for generative adversarial networks [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 4401 - 4410 .

FU T J , WANG X E , WANG W Y . Language-Driven artistic style transfer [C ] // Computer Vision-ECCV 2022 . Cham : Springer Nature Switzerland , 2022 : 717 - 734 .

ZHANG L M , RAO A Y , AGRAWALA M . Adding conditional control to text-to-image diffusion models [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 3813 - 3824 .

WANG Z X , WANG X T , XIE L B , et al . StyleAdapter: A unified stylized image generation model [EB/OL ] . ( 2024-10-30 )[ 2025-04-08 ] . https://arxiv.org/abs/2309.01770v2 https://arxiv.org/abs/2309.01770v2 .

CUI X , LI Z K , LI P P , et al . INSTASTYLE: Inversion noise of a stylized image is secretly a style adviser [C ] // Computer Vision-ECCV 2024 . Cham : Springer Nature Switzerland , 2024 : 455 - 472 .

LI W , FANG M Y , ZOU C , et al . StyleTokenizer: Defining image style by a single instance for controlling diffusion models [C ] // Computer Vision-ECCV 2024 . Cham : Springer Nature Switzerland , 2024 : 110 - 126 .

HINTON G E , SALAKHUTDINOV R R . Reducing the dimensionality of data with neural networks [J ] . Science , 2006 , 313 ( 5786 ): 504 - 507 .

JOHNSON J , ALAHI A , LI F F . Perceptual losses for real-time style transfer and super-resolution [C ] // Computer Vision-ECCV 2016 . Cham : Springer International Publishing , 2016 : 694 - 711 .

DUMOULIN V , SHLENS J , KUDLUR M . A learned representation for artistic style [EB/OL ] . ( 2017-02-09 )[ 2025-04-08 ] . https://arxiv.org/abs/1610.07629v5 https://arxiv.org/abs/1610.07629v5 .

CHEN D D , YUAN L , LIAO J , et al . StyleBank: An explicit representation for neural image style transfer [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 2770 - 2779 .

GHIASI G , LEE H , KUDLUR M , et al . Exploring the structure of a real-time, arbitrary neural artistic stylization network [EB/OL ] . ( 2017-08-24 )[ 2025-04-08 ] . https://arxiv.org/abs/1705.06830v2 https://arxiv.org/abs/1705.06830v2 .

LI X T , LIU S F , KAUTZ J , et al . Learning linear transformations for fast image and video style transfer [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 3809 - 3817 .

CHEN T Q , SCHMIDT M . Fast patch-based style transfer of arbitrary style [EB/OL ] . ( 2016-12-13 )[ 2025-04-08 ] . https://arxiv.org/abs/1612.04337v1 https://arxiv.org/abs/1612.04337v1 .

SHENG L , LIN Z Y , SHAO J , et al . Avatar-net: Multi-scale zero-shot style transfer by feature decoration [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 8242 - 8250 .

SUBAKAN C , RAVANELLI M , CORNELL S , et al . Attention is all you need in speech separation [C ] // ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE , 2021 : 21 - 25 .

PARK D Y , LEE K H . Arbitrary style transfer with style-attentional networks [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 5873 - 5881 .

YAO Y , REN J Q , XIE X S , et al . Attention-aware multi-stroke style transfer [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 1467 - 1475 .

DENG Y Y , TANG F , DONG W M , et al . StyTr2: Image style transfer with transformers [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 11316 - 11326 .

LIU S H , LIN T W , HE D L , et al . AdaAttN: Revisit attention mechanism in arbitrary neural style transfer [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 6629 - 6638 .

HONG K , JEON S , LEE J , et al . AesPA-net: Aesthetic pattern-aware style transfer networks [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 22701 - 22710 .

LI C , WAND M . Precomputed real-time texture synthesis with markovian generative adversarial networks [C ] // Computer Vision-ECCV 2016 . Cham : Springer International Publishing , 2016 : 702 - 716 .

MIRZA M . Conditional generative adversarial nets [EB/OL ] . ( 2014-11-06 )[ 2025-04-08 ] . https://arxiv.org/abs/1411.1784 https://arxiv.org/abs/1411.1784 .

ISOLA P , ZHU J Y , ZHOU T H , et al . Image-to-image translation with conditional adversarial networks [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 5967 - 5976 .

LIU M Y , TUZEL O . Coupled generative adversarial networks [EB/OL ] . ( 2016-09-20 )[ 2025-04-08 ] . https://arxiv.org/abs/1606.07536v2 https://arxiv.org/abs/1606.07536v2 .

LIU M Y , BREUEL T , KAUTZ J . Unsupervised image-to-image translation networks [C ] // Advances in neural information processing systems . New York : NIPS , 2017 : 700 - 708 .

KIM T , CHA M , KIM H , et al . Learning to discover cross-domain relations with generative adversarial networks [C ] // International Conference on Machine Learning . New York : PMLR , 2017 : 1857 - 1865 .

YI Z L , ZHANG H , TAN P , et al . DualGAN: Unsupervised dual learning for image-to-image translation [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 2868 - 2876 .

ZHU J Y , PARK T , ISOLA P , et al . Unpaired image-to-image translation using cycle-consistent adversarial networks [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 2242 - 2251 .

CHOI Y , CHOI M , KIM M , et al . StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 8789 - 8797 .

ANOOSHEH A , AGUSTSSON E , TIMOFTE R , et al . ComboGAN: Unrestrained scalability for image domain translation [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . Piscataway : IEEE , 2018 : 783 - 790 .

HUANG X , LIU M Y , BELONGIE S , et al . Multimodal unsupervised image-to-image translation [C ] // Computer Vision-ECCV 2018 . Cham : Springer International Publishing , 2018 : 179 - 196 .

LEE H Y , TSENG H Y , MAO Q , et al . DRIT++: Diverse image-to-image translation via disentangled representations [J ] . International Journal of Computer Vision , 2020 , 128 ( 10 ): 2402 - 2417 .

OJHA U , LI Y J , LU J W , et al . Few-shot image generation via cross-domain correspondence [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 10743 - 10752 .

ZHU P H , ABDAL R , FEMIANI J , et al . Mind the gap: Domain gap control for single shot domain adaptation for generative adversarial networks [EB/OL ] . ( 2021-11-28 )[ 2025-04-09 ] . https://arxiv.org/abs/2110.08398v2 https://arxiv.org/abs/2110.08398v2 .

CHONG M J , FORSYTH D . JoJoGAN: One shot face stylization [C ] // Computer Vision-ECCV 2022 . Cham : Springer Nature Switzerland , 2022 : 128 - 152 .

ZHANG Z , LIU Y , HAN C , et al . Generalized one-shot domain adaptation of generative adversarial networks [C ] // Advances in Neural Information Processing Systems . Beijing : University of Chinese Academy of Sciences , 2022 : 13718 - 13730 .

MEN Y F , YAO Y , CUI M M , et al . DCT-net [J ] . ACM Transactions on Graphics , 2022 , 41 ( 4 ): 1 - 9 .

SONG J M , MENG C L , ERMON S . Denoising diffusion implicit models [EB/OL ] . ( 2022-10-05 )[ 2025-04-08 ] . https://arxiv.org/abs/2010.02502v4 https://arxiv.org/abs/2010.02502v4 .

KIM G , KWON T , YE J C . DiffusionCLIP: Text-guided diffusion models for robust image manipulation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 2416 - 2425 .

WANG Z Z , ZHAO L , XING W . StyleDiffusion: Controllable disentangled style transfer via diffusion models [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 7643 - 7655 .

DHARIWAL P , NICHOL A . Diffusion models beat GANs on image synthesis [C ] // Advances in Neural Information Processing Systems . San Diego : NIPS , 2021 : 8780 - 8794 .

YANG S , HWANG H , YE J C . Zero-shot contrastive loss for text-guided diffusion image style transfer [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 22816 - 22825 .

HUANG N S , TANG F , DONG W M , et al . Draw your art dream: Diverse digital art synthesis with multimodal guided diffusion [C ] // Proceedings of the 30th ACM International Conference on Multimedia . New York : ACM , 2022 : 1085 - 1094 .

CHO H , LEE J , CHANG S , et al . One-shot structure-aware stylized image synthesis [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2024 : 8302 - 8311 .

ROMBACH R , BLATTMANN A , LORENZ D , et al . High-resolution image synthesis with latent diffusion models [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 10674 - 10685 .

EVERAERT M N , BOCCHIO M , ARPA S , et al . Diffusion in style [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 2251 - 2261 .

HERTZ A , MOKADY R , TENENBAUM J , et al . Prompt-to-prompt image editing with cross attention control [EB/OL ] . ( 2022-08-02 )[ 2025-04-08 ] . https://arxiv.org/abs/2208.01626v1 https://arxiv.org/abs/2208.01626v1 .

LI S M , VAN DE WEIJER J , HU T H , et al . StyleDiffusion: Prompt-embedding inversion for text-based editing [EB/OL ] . ( 2024-12-06 )[ 2025-04-08 ] . https://arxiv.org/abs/2303.15649v3 https://arxiv.org/abs/2303.15649v3 .

PARMAR G , KUMAR SINGH K , ZHANG R , et al . Zero-shot image-to-image translation [C ] // Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings . New York : ACM , 2023 : 1 - 11 .

WANG Q X , BAI X , WANG H F , et al . InstantID: Zero-shot identity-preserving generation in seconds [EB/OL ] . ( 2024-02-02 )[ 2025-04-08 ] . https://arxiv.org/abs/2401.07519v2 https://arxiv.org/abs/2401.07519v2 .

JEONG J , KWON M , UH Y . Training-free content injection using h-space in diffusion models [C ] // 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2024 : 5139 - 5149 .

CHUNG J , HYUN S , HEO J P . Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2024 : 8795 - 8805 .

GAL R , ALALUF Y , ATZMON Y , et al . An image is worth one word: Personalizing text-to-image generation using textual inversion [EB/OL ] . ( 2022-08-02 )[ 2025-04-08 ] . https://arxiv.org/abs/2208.01618v1 https://arxiv.org/abs/2208.01618v1 .

ZHANG Y X , HUANG N S , TANG F , et al . Inversion-based style transfer with diffusion models [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 10146 - 10156 .

KAWAR B , ZADA S , LANG O , et al . Imagic: Text-based real image editing with diffusion models [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 6007 - 6017 .

AN J , HUANG S Y , SONG Y B , et al . ArtFlow: Unbiased image style transfer via reversible neural flows [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 862 - 871 .

FAN W C , CHEN J H , MA J B , et al . StyleFlow for content-fixed image to image translation [EB/OL ] . ( 2022-07-05 )[ 2025-04-08 ] . https://arxiv.org/abs/2207.01909v1 https://arxiv.org/abs/2207.01909v1 .

WEN L F , GAO C Y , ZOU C Q . CAP-VSTNet: Content affinity preserved versatile style transfer [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 18300 - 18309 .

XIA X D , ZHANG M , XUE T F , et al . Joint bilateral learning for real-time universal photorealistic style transfer [C ] // Computer Vision-ECCV 2020 . Cham : Springer International Publishing , 2020 : 327 - 342 .

LIN T Y , MAIRE M , BELONGIE S , et al . Microsoft COCO: Common objects in context [C ] // Computer Vision-ECCV 2014 . Cham : Springer International Publishing , 2014 : 740 - 755 .

DENG J , DONG W , SOCHER R , et al . ImageNet: A large-scale hierarchical image database [C ] // 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 248 - 255 .

YU F , SEFF A , ZHANG Y D , et al . LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop [EB/OL ] . ( 2016-06-04 )[ 2025-04-08 ] . https://arxiv.org/abs/1506.03365v3 https://arxiv.org/abs/1506.03365v3 .

ZHOU B L , LAPEDRIZA A , KHOSLA A , et al . Places: A 10 million image database for scene recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 40 ( 6 ): 1452 - 1464 .

LIU Z W , LUO P , WANG X G , et al . Deep learning face attributes in the wild [C ] // 2015 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2015 : 3730 - 3738 .

KARRAS T , AILA T M , LAINE S , et al . Progressive growing of GANs for improved quality, stability, and variation [EB/OL ] . ( 2018-02-26 )[ 2025-04-08 ] . https://arxiv.org/abs/1710.10196v3 https://arxiv.org/abs/1710.10196v3 .

TAN W R , CHAN C S , AGUIRRE H E , et al . Improved ArtGAN for conditional synthesis of natural image and artwork [J ] . IEEE Transactions on Image Processing , 2019 , 28 ( 1 ): 394 - 409 .

KARRAS T , AITTALA M , HELLSTEN J , et al . Training generative adversarial networks with limited data [C ] // Advances in Neural Information Processing Systems . New York : NIPS , 2020 : 12104 - 12114 .

LIU M C , LI Q , QIN Z K , et al . BlendGAN: Implicitly GAN blending for arbitrary stylized face generation [EB/OL ] . ( 2021-10-22 )[ 2025-04-08 ] . https://arxiv.org/abs/2110.11728v1 https://arxiv.org/abs/2110.11728v1 .

KIM J , KIM M , KANG H , et al . U-GAT-IT: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation [EB/OL ] . ( 2020-04-08 )[ 2025-04-08 ] . https://arxiv.org/abs/1907.10830v4 https://arxiv.org/abs/1907.10830v4 .

HUO J , LI W B , SHI Y H , et al . WebCaricature: A benchmark for caricature recognition [EB/OL ] . ( 2018-08-09 )[ 2025-04-08 ] . https://arxiv.org/abs/1703.03230v4 https://arxiv.org/abs/1703.03230v4 .

PHILLIPS F , MACKINTOSH B . Wiki art gallery, inc.: A case for critical thinking [J ] . Issues in Accounting Education , 2011 , 26 ( 3 ): 593 - 608 .

WANG Z , BOVIK A C , SHEIKH H R , et al . Image quality assessment: From error visibility to structural similarity [J ] . IEEE Transactions on Image Processing , 2004 , 13 ( 4 ): 600 - 612 .

ZHANG R , ISOLA P , EFROS A A , et al . The unreasonable effectiveness of deep features as a perceptual metric [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 586 - 595 .

HEUSEL M , RAMSAUER H , UNTERTHINER T , et al . GANs trained by a two time-scale update rule converge to a local Nash equilibrium [EB/OL ] . ( 2018-01-12 )[ 2025-04-08 ] . https://arxiv.org/abs/1706.08500v6 https://arxiv.org/abs/1706.08500v6 .

SHAHAM T R , DEKEL T , MICHAELI T . SinGAN: Learning a generative model from a single natural image [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 4570 - 4580 .

WRIGHT M , OMMER B . ArtFID: Quantitative evaluation of neural style transfer [C ] // Pattern Recognition . Cham : Springer International Publishing , 2022 : 560 - 576 .

ZHANG Y X , TANG F , DONG W M , et al . Domain enhanced arbitrary image style transfer via contrastive learning [C ] // Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings . New York : ACM , 2022 : 1 - 8 .

YANG S , JIANG L M , LIU Z W , et al . Pastiche master: Exemplar-based high-resolution portrait style transfer [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 7683 - 7692 .

LIU J , HUANG H B , CAO J , et al . ZePo: Zero-shot portrait stylization with faster sampling [C ] // Proceedings of the 32nd ACM International Conference on Multimedia . New York : ACM , 2024 : 3509 - 3518 .

ZHOU Y , CHEN Z C , HUANG H . Deformable one-shot face stylization via DINO semantic guidance [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2024 : 7787 - 7796 .

CAI Y C , LIU Y H , ZHANG Z , et al . CLAP: Isolating content from style through contrastive learning with augmented prompts [C ] // Computer Vision-ECCV 2024 . Cham : Springer Nature Switzerland , 2024 : 130 - 147 .

SOMEPALLI G , GUPTA A , GUPTA K , et al . Investigating Style similarity in diffusion models [C ] // Computer Vision-ECCV 2024 . Cham : Springer Nature Switzerland , 2024 : 143 - 160 .

KOTOVENKO D , GREBENKOVA O , SARAFIANOS N , et al . WaSt-3D: Wasserstein-2 distance for scene-to-scene stylization on 3D Gaussians [C ] // Computer Vision-ECCV 2024 . Cham : Springer Nature Switzerland , 2024 : 298 - 314 .

LIU K H , ZHAN F N , XU M Y , et al . StyleGaussian: Instant 3D style transfer with Gaussian splatting [C ] // SIGGRAPH Asia 2024 Technical Communications . New York : ACM , 2024 : 1 - 4 .

CHEN Z , XU X D , YAN Y C , et al . HyperStyle3D: Text-guided 3D portrait stylization via hypernetworks [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2024 , 34 ( 10 ): 9997 - 10010 .

SONG G X . AgileGAN3D: Few-shot 3D portrait stylization by augmented transfer learning [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . Piscataway : IEEE , 2024 : 765 - 774 .

CHEN Y S , YUAN Q , LI Z Q , et al . UPST-NeRF: Universal photorealistic style transfer of neural radiance fields for 3D scene [J ] . IEEE Transactions on Visualization and Computer Graphics , 2025 , 31 ( 4 ): 2045 - 2057 .

DONG R P , HAN C R , PENG Y A , et al . DreamLLM: Synergistic multimodal comprehension and creation [EB/OL ] . ( 2024-03-15 )[ 2025-04-08 ] . https://arxiv.org/abs/2309.11499v2 https://arxiv.org/abs/2309.11499v2 .

ZHOU Y F , ZHANG R Y , GU J X , et al . Customization assistant for text-to-image generation [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2024 : 9182 - 9191 .

ZHENG K Z , HE X H , WANG X E . MiniGPT-5: Interleaved vision-and-language generation via generative vokens [EB/OL ] . ( 2024-03-15 )[ 2025-04-08 ] . https://arxiv.org/abs/2310.02239v3 https://arxiv.org/abs/2310.02239v3 .

GE Y Y , ZHAO S J , ZENG Z Y , et al . Making LLaMA SEE and draw with SEED tokenizer [EB/OL ] . ( 2023-10-02 )[ 2025-04-08 ] . https://arxiv.org/abs/2310.01218v1 https://arxiv.org/abs/2310.01218v1 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

连续学习方法与其在视觉任务中的应用

DRHA-UIE：基于双重残差混合注意力模块的水下图像增强方法

基于无标签视频数据的深度预测学习方法综述