Generative Image Detection Based on Diffusion Artifact Contrast Learning

YUAN Chengsheng; CHEN Jinrui; CAO Yi; LIU Qingcheng; ZHOU Zhili; FU Zhangjie

doi:10.12263/DZXB.20250663

您当前的位置：

首页 >

文章列表页 >

Generative Image Detection Based on Diffusion Artifact Contrast Learning

PAPERS | 更新时间：2026-06-04

- Generative Image Detection Based on Diffusion Artifact Contrast Learning
- ACTA ELECTRONICA SINICA Vol. 54, Issue 1, Pages: 248-261(2026)
- 作者机构：
  
  1.南京信息工程大学计算机学院、网络空间安全学院，江苏南京 210044
  2.南京信息工程大学数字取证教育部工程研究中心，江苏南京 210044
  3.无锡学院网络安全与信息化学院，江苏无锡 214105
  4.广州大学人工智能研究院，广东广州 510006
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(U22B2062;U23B2023;62102189)
- DOI：10.12263/DZXB.20250663
  CLC： TP309.2;
- Received：29 July 2025，
  
  Accepted：26 December 2025，
  
  Published：25 January 2026
- 稿件说明：
移动端阅览
袁程胜, 陈金瑞, 曹燚, 等. 基于扩散伪影对比学习的生成式图像检测方法[J]. 电子学报, 2026, 54(01): 248-261.

YUAN Chengsheng, CHEN Jinrui, CAO Yi, et al. Generative Image Detection Based on Diffusion Artifact Contrast Learning[J]. Acta Electronica Sinica, 2026, 54(01): 248-261.
袁程胜, 陈金瑞, 曹燚, 等. 基于扩散伪影对比学习的生成式图像检测方法[J]. 电子学报, 2026, 54(01): 248-261. DOI：10.12263/DZXB.20250663

YUAN Chengsheng, CHEN Jinrui, CAO Yi, et al. Generative Image Detection Based on Diffusion Artifact Contrast Learning[J]. Acta Electronica Sinica, 2026, 54(01): 248-261. DOI：10.12263/DZXB.20250663

摘要

随着以扩散模型为代表的生成式人工智能在视觉内容合成领域持续取得突破，其生成的图像在视觉真实感与内容多样性方面已逼近甚至部分超越真实摄影水平。然而，技术的快速发展也使生成式图像，特别是可能用于恶意目的的深度伪造内容的检测与鉴别任务变得日益复杂与严峻。现有大多数检测算法在受控的实验室环境下能够表现出较好的性能，但在开放的真实场景中，一旦面临训练数据与测试数据之间存在显著分布差异的情况，例如遇到未知的生成模型、未见过的图像风格或经过复杂后处理的伪造样本，这些方法的泛化能力与鲁棒性往往明显不足。为应对上述挑战，本文从困难样本分类的角度出发，提出一种基于扩散伪影对比学习（Contrastive Learning of Diffusion Artifacts，CLDA）的生成式图像检测方法，通过多模块协同优化，以提升模型对生成图像的检测精度与鲁棒性。首先，基于高质量扩散模型构造具有挑战性的生成样本，为模型训练提供更丰富的数据基础。随后，设计伪影增强模块，引入潜在空间跨域增强策略，通过基于余弦相似度加权的特征插值方法扩展伪造特征空间；同时结合域损失机制，引导编码器学习不同伪造域的鉴别性特征，避免模型对特定伪造模式过度依赖。进一步地，提出一种基于潜在空间边界的对比损失函数，通过动态权重聚焦于决策边界附近的困难样本对，以增强模型对真实图像、生成图像及反演图像间细微差异的辨识能力，并将该损失与二分类交叉熵损失相结合，构建统一的多目标优化函数。为验证本文所提方法的有效性，本文在GenImage与DRCT-2M两个公开数据集上进行了对比实验。实验结果表明，经过本文框架优化后的检测器，在GenImage数据集上的平均准确率提升1.1个百分点，在DRCT-2M数据集上的平均准确率提升4.8个百分点。此外，在图像缩放、JPEG压缩、高斯噪声等干扰场景下，本文方法仍保持较高的平均检测精度，其鲁棒性显著优于现有对比方法。

Abstract

With the continuous breakthroughs in generative artificial intelligence represented by diffusion models in the field of visual content synthesis

the generated images have approached or even partially surpassed real photographic levels in terms of visual realism and content diversity. However

the rapid development of this technology has also made the detection and identification of generated images—especially deepfake content that may be used for malicious purposes—increasingly complex and challenging. Most existing detection algorithms perform well in controlled laboratory environments

but in open real-world scenarios

once they encounter significant distributional differences between training and testing data—such as unknown generative models

unseen image styles

or forged samples subjected to complex post-processing—their generalization capability and robustness often exhibit notable deficiencies. To address these challenges

this paper proposes a generated image detection method based on contrastive learning of diffusion artifacts (CLDA) from the perspective of hard sample classification. The approach employs multi-module collaborative optimization to enhance the detection accuracy and robustness of the model for generated images. First

challenging generated samples are constructed using high-quality diffusion models to provide a richer data foundation for model training. Subsequently

an artifact enhancement module is designed

introducing a latent space cross-domain enhancement strategy. This strategy expands the forged feature space through feature interpolation weighted by cosine similarity

while incorporating a domain loss mechanism to guide the encoder in learning discriminative features across different forgery domains

thereby preventing the model from over-relying on specific forgery patterns. Furthermore

a contrastive loss function based on latent space boundaries is proposed

which employs dynamic weighting to focus on hard sample pairs near the decision boundary. This enhances the model’s ability to discern subtle differences between real images

generated images

and inverted images. This loss is then combined with binary cross-entropy loss to construct a unified multi-objective optimization function. To validate the effectiveness of the proposed method

comparative experiments were conducted on two public datasets

GenImage and DRCT-2M. The experimental results demonstrate that the detector optimized by the proposed framework achieves an average accuracy improvement of 1.1 percentage points on the GenImage dataset and 4.8 percentage points on the DRCT-2M dataset. Additionally

under challenging scenarios such as image scaling

JPEG compression

and Gaussian noise

the proposed method maintains a high average detection accuracy

with its robustness significantly outperforming existing comparative methods.

关键词

Keywords

references

惠康华 , 闫建青 , 高思华 , 等 . 基于特征融合的轻量级新残差人脸识别方法 [J ] . 电子学报 , 2024 , 52 ( 3 ): 937 - 944 .

Hui Kanghua , Yan Jianqing , Gao Sihua , et al . Lightweight new fesidual face recognition method based on feature fusion [J ] . Acta Electronica Sinica , 2024 , 52 ( 3 ): 937 - 944 . (in Chinese)

Gu Fei , Dai Yunshu , Fei Jianwei , et al . Deepfake detection and localisation based on illumination inconsistency [J ] . International Journal of Autonomous and Adaptive Communications Systems , 2024 , 17 ( 4 ): 352 - 368 . DOI: 10.1504/ijaacs.2024.139383 http://dx.doi.org/10.1504/ijaacs.2024.139383

何琨 , 佘计思 , 张子君 , 等 . 基于引导扩散模型的自然对抗补丁生成方法 [J ] . 电子学报 , 2024 , 52 ( 2 ): 564 - 573 .

He Kun , She Jisi , Zhang Zijun , et al . A guided diffusion-based approach to natural adversarial patch generation [J ] . Acta Electronica Sinica , 2024 , 52 ( 2 ): 564 - 573 . (in Chinese)

Yang Ling , Zhang Zhilong , Song Yang , et al . Diffusion models: A comprehensive survey of methods and applications [J ] . ACM Computing Surveys , 2024 , 56 ( 4 ): 105 . DOI: 10.1145/3626235 http://dx.doi.org/10.1145/3626235

Zhang Xu , Karaman S , Chang S F . Detecting and simulating artifacts in GAN fake images [C ] // 2019 IEEE International Workshop on Information Forensics and Security . Piscataway : IEEE , 2019 : 9035107 . DOI: 10.1109/wifs47025.2019.9035107 http://dx.doi.org/10.1109/wifs47025.2019.9035107

Juefei-Xu F , Wang Run , Huang Yihao , et al . Countering malicious DeepFakes: Survey, battleground, and horizon [J ] . International Journal of Computer Vision , 2022 , 130 ( 7 ): 1678 - 1734 . DOI: 10.1007/s11263-022-01606-8 http://dx.doi.org/10.1007/s11263-022-01606-8

Wang Zhendong , Bao Jianmin , Zhou Wengang , et al . DIRE for diffusion-generated image detection [C ] // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2024 : 22388 - 22398 . DOI: 10.1109/iccv51070.2023.02051 http://dx.doi.org/10.1109/iccv51070.2023.02051

Zhu Mingjian , Chen Hanting , Yan Qiangyu , et al . GenImage: A million-scale benchmark for detecting AI-generated image [C ] // Proceedings of the 37th International Conference on Neural Information Processing Systems . New York : Curran Associates Inc. , 2023 : 3398 . DOI: 10.52202/075280-3398 http://dx.doi.org/10.52202/075280-3398

Chen Baoying , Zeng Jishen , Yang Jianquan , et al . DRCT: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images [C ] // Proceedings of the 41st International Conference on Machine Learning . PMLR , 2024 : 7621 - 7639 .

Ho J , Jain A , Abbeel P . Denoising diffusion probabilistic models [C ] // Proceedings of the 34th International Conference on Neural Information Processing Systems . New York : Curran Associates Inc. , 2020 : 574 .

Song Jiaming , Meng Chenlin , Ermon S . Denoising diffusion implicit models [C ] // Proceedings of the 9th International Conference on Learning Representations . OpenReview . net , 2021 .

Dhariwal P , Nichol A . Diffusion models beat GANs on image synthesis [C ] // Proceedings of the 35th International Conference on Neural Information Processing Systems . New York : Curran Associates Inc. , 2021 : 672 .

Tan Chuangchuang , Tao Renshuai , Liu Huan , et al . C2P-CLIP: Injecting category common prompt in CLIP to enhance generalization in deepfake detection [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2025 , 39 ( 7 ): 7184 - 7192 . DOI: 10.1609/aaai.v39i7.32772 http://dx.doi.org/10.1609/aaai.v39i7.32772

Zhao Shihao , Chen Dongdong , Chen Y C , et al . Uni-ControlNet: All-in-one control to text-to-image diffusion models [C ] // Proceedings of the 37th International Conference on Neural Information Processing Systems . New York : Curran Associates Inc. , 2023 : 491 . DOI: 10.52202/075280-0491 http://dx.doi.org/10.52202/075280-0491

Rombach R , Blattmann A , Lorenz D , et al . High-resolution image synthesis with latent diffusion models [C ] // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 10674 - 10685 . DOI: 10.1109/cvpr52688.2022.01042 http://dx.doi.org/10.1109/cvpr52688.2022.01042

Li Lingzhi , Bao Jianmin , Zhang Ting , et al . Face X-ray for more general face forgery detection [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 5000 - 5009 . DOI: 10.1109/cvpr42600.2020.00505 http://dx.doi.org/10.1109/cvpr42600.2020.00505

Vasilcoiu A , Najdenkoska I , Geradts Z , et al . LATTE: Latent trajectory embedding for diffusion-generated image detection [PP/OL ] . V2.arXiv ( 2025-09-29 )[ 2025-10-10 ] . https://arXiv.org/abs/2507.03054 https://arXiv.org/abs/2507.03054 .

Wang Shengyu , Wang O , Zhang R , et al . CNN-generated images are surprisingly easy to spot... for now [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 8692 - 8701 . DOI: 10.1109/cvpr42600.2020.00872 http://dx.doi.org/10.1109/cvpr42600.2020.00872

Frank J , Eisenhofer T , Schönherr L , et al . Leveraging frequency analysis for deep fake image recognition [C ] // Proceedings of the 37th International Conference on Machine Learning . JMLR . org , 2020 : 304 .

Yu Ning , Davis L , Fritz M . Attributing fake images to GANs: Learning and analyzing GAN fingerprints [C ] // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2019 : 7555 - 7565 . DOI: 10.1109/iccv.2019.00765 http://dx.doi.org/10.1109/iccv.2019.00765

Radford A , Kim J W , Hallacy C , et al . Learning transferable visual models from natural language supervision [C ] // Proceedings of the 38th International Conference on Machine Learning . PMLR , 2021 : 8748 - 8763 . DOI: 10.48550/arXiv.2103.00020 http://dx.doi.org/10.48550/arXiv.2103.00020

Ojha U , Li Yuheng , Lee Y J . Towards universal fake image detectors that generalize across generative models [C ] // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 24480 - 24489 . DOI: 10.1109/cvpr52729.2023.02345 http://dx.doi.org/10.1109/cvpr52729.2023.02345

Li Xinghan , Yu Yue , Song Xue , et al . Revealing the implicit noise-based imprint of generative models [PP/OL ] . V2.arXiv ( 2025-11-16 )[ 2025-10-10 ] . https://arXiv.org/abs/2503.09314 https://arXiv.org/abs/2503.09314 .

Liu Bo , Yang Fan , Bi Xiuli , et al . Detecting generated images by real images [C ] // Proceedings of the 17th European Conference on Computer Vision . Heidelberg : Springer , 2022 : 95 - 110 . DOI: 10.1007/978-3-031-19781-9_6 http://dx.doi.org/10.1007/978-3-031-19781-9_6

Guarnera L , Giudice O , Battiato S . Level up the deepfake detection: A method to effectively discriminate images generated by GAN architectures and diffusion models [M ] //Arai K. Intelligent systems and applications . Cham : Springer , 2024 : 615 - 625 . DOI: 10.1007/978-3-031-66431-1_43 http://dx.doi.org/10.1007/978-3-031-66431-1_43

Guo Xiao , Liu Xiaohong , Ren Zhiyuan , et al . Hierarchical fine-grained image forgery detection and localization [C ] // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 3155 - 3165 . DOI: 10.1109/cvpr52729.2023.00308 http://dx.doi.org/10.1109/cvpr52729.2023.00308

Qian Yuyang , Yin Guojun , Sheng Lu , et al . Thinking in frequency: Face forgery detection by mining frequency-aware clues [C ] // Proceedings of the 16th European Conference on Computer Vision . Heidelberg : Springer , 2020 : 86 - 103 . DOI: 10.1007/978-3-030-58610-2_6 http://dx.doi.org/10.1007/978-3-030-58610-2_6

Liu Zhengzhe , Qi Xiaojuan , Torr P H S . Global texture enhancement for fake face detection in the wild [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 8057 - 8066 . DOI: 10.1109/cvpr42600.2020.00808 http://dx.doi.org/10.1109/cvpr42600.2020.00808

Tan Chuangchuang , Liu Huan , Zhao Yao , et al . Rethinking the up-sampling operations in CNN-based generative network for generalizable deepfake detection [C ] // Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 28130 - 28139 . DOI: 10.1109/cvpr52733.2024.02657 http://dx.doi.org/10.1109/cvpr52733.2024.02657

Luo Yunpeng , Du Junlong , Yan Ke , et al . LaRE 2 : Latent reconstruction error based method for diffusion-generated image detection [C ] // Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 17006 - 17015 . DOI: 10.1109/cvpr52733.2024.01609 http://dx.doi.org/10.1109/cvpr52733.2024.01609

Liu Zhuang , Mao Hanzi , Wu Chaoyuan , et al . A ConvNet for the 2020 s[C ] // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 11966 - 11976 . DOI: 10.1109/cvpr52688.2022.01167 http://dx.doi.org/10.1109/cvpr52688.2022.01167

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Progressive Image Synthesis Method Based on Diffusion-Mamba and Scale-Invariant Loss

Feature Masking and Contrastive Learning Integrating Multi-Dimensional Decorrelation in Sequential Recommendation

A Diffusion Model Driven Approach for Cross-Time-Domain Incremental Specific Emitter Identification

Encrypted Traffic Detection Based on Gradient Collaboration and Feature Fusion

Attention Penalty and Adaptive Learning Scene Graph for Joint Multimodal Aspect-Based Sentiment Analysis

Related Author

YUAN Chengsheng

CHEN Jinrui

LIU Qingcheng

FU Zhangjie

LI Hao

HAO Wen-ning

ZOU Shi-chen

XIE Xiao-yu

Related Institution

School of Computer Science, Nanjing University of Information Science and Technology

Engineering Research Center of Digital Forensics Ministry of Education, Nanjing University of Information Science and Technology

College of Command and Control Engineering, Army Engineering University of PLA

School of Computing and Artificial Intelligence, Jiangxi University of Finance and Economics

Anhui Province Key Laboratory of Electronic Restriction

⁰