1.合肥工业大学计算机与信息学院,安徽合肥 230601
2.中山大学网络空间安全学院,广东深圳 518107
3.新疆大学计算机科学与技术学院,新疆乌鲁木齐 830017
4.小米汽车,北京 100085
张睿萱 女,2002年7月出生于河南省郑州市。现为合肥工业大学硕士研究生。主要研究方向为AI生成内容检测与对抗安全。E-mail: ruixuanzhangr@gmail.com
刁云峰 男,1993年7月出生于山东省烟台市。现为合肥工业大学计算机与信息学院副教授。在国内外知名期刊/会议发表学术论文40余篇。主要研究方向为媒体内容安全、人工智能安全。中国电子学会会员编号:E190201650M。E-mail: diaoyunfeng@hfut.edu.cn
陆智远 男,2004年12月出生于福建省南平市。现为合肥工业大学计算本科生。主要研究方向为对抗样本。E-mail: 18063718180@163.com
夏海峰 男,1993年7月出生于山东省枣庄市。2023年博士毕业于美国杜兰大学计算机科学系。现为中山大学副教授。主要研究方向为计算机视觉、多模态学习和迁移学习。E-mail: xiahf5@mail.sysu.edu.cn
郭治卿 男,1991年9月出生于新疆维吾尔自治区霍城县。现为新疆大学计算机科学与技术学院副教授、博士生导师。在国内外发表学术论文50余篇。E-mail: guozhiqing@xju.edu.cn
郝孝帅 男,1994年10月出生山东省烟台市。现为小米汽车自动驾驶与具身智能算法专家。在国内外知名期刊/会议发表学术论文50余篇。主要研究方向为自动驾驶鲁棒性和具身基座大模型。E-mail: haoxiaoshuai@xiaomi.com
汪萌 男,1984年12月出生于湖北省监利市。现任合肥工业大学党委副书记、校长、教授、博士生导师。主要研究方向为模式识别与多媒体信息处理等。获国家杰出青年科学基金资助,入选国际电气与电子工程师协会会士、国际模式识别协会会士。中国电子学会会员编号:E190011561M。E-mail: eric.mengwang@gmail.com
收稿:2026-02-27,
录用:2026-03-12,
纸质出版:2026-03-25
移动端阅览
张睿萱, 刁云峰, 陆智远, 等. 基于对抗混合专家后训练机制的鲁棒AI生成图像检测方法[J]. 电子学报, 2026, 54(03): 1178-1193.
ZHANG Ruixuan, DIAO Yunfeng, LU Zhiyuan, et al. Adversarial Mixture of Experts Post-Training for Robust AI-Generated Image Detection[J]. Acta Electronica Sinica, 2026, 54(03): 1178-1193.
张睿萱, 刁云峰, 陆智远, 等. 基于对抗混合专家后训练机制的鲁棒AI生成图像检测方法[J]. 电子学报, 2026, 54(03): 1178-1193. DOI:10.12263/DZXB.20251196
ZHANG Ruixuan, DIAO Yunfeng, LU Zhiyuan, et al. Adversarial Mixture of Experts Post-Training for Robust AI-Generated Image Detection[J]. Acta Electronica Sinica, 2026, 54(03): 1178-1193. DOI:10.12263/DZXB.20251196
AI生成图像技术(AI-Generated Images,AIGI)技术实现了高质量视觉内容的自动化生产,在艺术创作、数字娱乐及虚拟现实等领域展现出巨大的应用潜力。然而,该技术在赋能内容生产的同时,也带来了严峻的安全与伦理挑战。生成模型可能被恶意用于伪造真实人物或事件,进而制造虚假信息、传播深度伪造内容,甚至干扰网络舆论。因此,如何有效识别AI生成图像(AIGI检测),已成为保障数字内容可信性和维护网络空间安全的重要研究课题。然而,现有的AIGI检测器在面对对抗攻击时普遍表现出鲁棒性不足的问题,攻击者仅需向合成图像中添加人眼难以察觉的细微对抗扰动,即可使其绕过检测,导致合成内容被误判为真实图像,且对于此类攻击的防御机制仍鲜有研究。针对该问题,本文首先系统评估了对抗训练在AIGI检测任务上的有效性。理论分析与实验结果表明,其在训练过程中易诱发特征纠缠现象,进而导致检测性能严重退化甚至崩塌。鉴于此,亟需发展一种针对AIGI检测任务有效的专用对抗防御方法。与对抗训练中出现的特征纠缠不同,本文发现在标准训练的检测器中,对抗扰动会致使对抗样本在特征空间中的表示明显偏离于干净样本,从而形成显著的可分离性。基于该观察,本文提出将对抗样本视作独立类别进行建模的策略,并构建了一种后训练防御框架:在保持预训练特征提取器固定的前提下,仅通过学习新的分类边界以拟合对抗样本的特征分布。为增强模型对未知攻击的泛化能力,本文进一步提出一种对抗混合专家后训练机制。该机制利用多个专家模块分别学习特定攻击类型的特征模式,并引入共享专家以捕捉不同攻击间的共性表征,从而实现对多类对抗样本的高效建模与鲁棒识别。实验结果表明,本文方法在ProGAN和Stable Diffusion等主流AIGI数据集上,面对多种典型对抗攻击方式,在不牺牲良性样本检测精度的前提下,其平均对抗准确率相较现有主流防御方法分别提升了18.92%与12.56%,展现出良好的实用性与在实际安全场景中的应用潜力。
AI-generated imagery (AIGI) technology has enabled the automated production of high-quality visual content
demonstrating enormous application potential in fields such as artistic creation
digital entertainment
and virtual reality. However
while empowering content production
this technology also brings serious security and ethical challenges. Generative models can be maliciously used to forge real people or events
thereby creating false information
spreading deepfake content
and even interfering with online public opinion. Therefore
how to effectively identify AI-generated images (AIGI detection) has become an important research topic for ensuring the credibility of digital content and maintaining cyberspace security. However
existing AIGI detectors generally exhibit insufficient robustness against adversarial attacks. Attackers only need to add subtle adversarial perturbations imperceptible to the human eye to the synthesized image to bypass detection
causing the synthesized content to be misclassified as a real image
and defense mechanisms against such attacks are still scarce. To address this issue
this paper first systematically evaluates the effectiveness of adversarial training in AIGI detection tasks. Theoretical analysis and experimental results show that it is prone to inducing feature entanglement during training
leading to severe degradation or even collapse of detection performance. Therefore
there is an urgent need to develop a dedicated adversarial defense method effective for AIGI detection tasks. Unlike feature entanglement that occurs in adversarial training
this paper finds that adversarial perturbations in standard-trained detectors cause adversarial examples to deviate significantly from clean examples in the feature space
resulting in significant separability. Based on this observation
this paper proposes a strategy of modeling adversarial examples as independent categories and constructs a post-training defense framework: while keeping the pre-trained feature extractor fixed
it only learns new classification boundaries to fit the feature distribution of adversarial examples. To enhance the model’s generalization ability to unknown attacks
this paper further proposes an adversarial hybrid expert post-training mechanism. This mechanism utilizes multiple expert modules to learn feature patterns for specific attack types and introduces shared experts to capture common representations among different attacks
thereby achieving efficient modeling and robust identification of multiple classes of adversarial examples. Experimental results show that on mainstream AIGI datasets such as ProGAN and Stable Diffusion
facing various typical adversarial attack methods
the average adversarial accuracy is improved by 18.92% and 12.56% respectively compared to existing mainstream defense methods without sacrificing the detection accuracy of benign examples
demonstrating good practicality and application potential in real-world security scenarios.
新京报 . 【防范网络诈骗】 如何防范AI诈骗 [EB/OL ] . ( 2025-11-25 )[ 2026-02-27 ] . https://xinwen.bjd.com.cn/content/s69251de8d5de1e4309a10ee8.html https://xinwen.bjd.com.cn/content/s69251de8d5de1e4309a10ee8.html .
国家互联网信息办公室 , 国家发展和改革委员会 , 教育部 , 等 . 生成式人工智能服务管理暂行办法 [EB/OL ] . ( 2023-07-10 )[ 2026-02-27 ] . https://www.gov.cn/zhengce/zhengceku/202307/content_6891752.htm https://www.gov.cn/zhengce/zhengceku/202307/content_6891752.htm .
Carlini N , Farid H . Evading deepfake-image detectors with white- and black-box attacks [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Piscataway : IEEE , 2020 : 2804 - 2813 . DOI: 10.1109/cvprw50498.2020.00337 http://dx.doi.org/10.1109/cvprw50498.2020.00337
Pavlitska S , Hubschneider C , Struppek L , et al . Sparsely-gated mixture-of-expert layers for CNN interpretability [C ] // Proceedings of 2023 International Joint Conference on Neural Networks (IJCNN) . Piscataway : IEEE , 2023 : 1 - 10 . DOI: 10.1109/ijcnn54540.2023.10191904 http://dx.doi.org/10.1109/ijcnn54540.2023.10191904
Wang Shengyu , Wang O , Zhang R , et al . CNN-generated images are surprisingly easy to spot… for now [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 8692 - 8701 . DOI: 10.1109/cvpr42600.2020.00872 http://dx.doi.org/10.1109/cvpr42600.2020.00872
Liu Zhengzhe , Qi Xiaojuan , Torr P H S . Global texture enhancement for fake face detection in the wild [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 8057 - 8066 . DOI: 10.1109/cvpr42600.2020.00808 http://dx.doi.org/10.1109/cvpr42600.2020.00808
Tan Chuangchuang , Liu Huan , Zhao Yao , et al . Rethinking the up-sampling operations in CNN-based generative network for generalizable deepfake detection [C ] // Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 28130 - 28139 . DOI: 10.1109/cvpr52733.2024.02657 http://dx.doi.org/10.1109/cvpr52733.2024.02657
Ricker J , Lukovnikov D , Fischer A . AEROBLADE: Training-free detection of latent diffusion images using autoencoder reconstruction error [C ] // Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2024 : 9130 - 9140 . DOI: 10.1109/cvpr52733.2024.00872 http://dx.doi.org/10.1109/cvpr52733.2024.00872
Wang Zhendong , Bao Jianmin , Zhou Wengang , et al . DIRE for diffusion-generated image detection [C ] // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 22388 - 22398 . DOI: 10.1109/iccv51070.2023.02051 http://dx.doi.org/10.1109/iccv51070.2023.02051
Ojha U , Li Yuheng , Lee Y J . Towards universal fake image detectors that generalize across generative models [C ] // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 24480 - 24489 . DOI: 10.1109/cvpr52729.2023.02345 http://dx.doi.org/10.1109/cvpr52729.2023.02345
Liu Huan , Tan Zichang , Tan Chuangchuang , et al . Forgery-aware adaptive transformer for generalizable synthetic image detection [C ] // Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2024 : 10770 - 10780 . DOI: 10.1109/cvpr52733.2024.01024 http://dx.doi.org/10.1109/cvpr52733.2024.01024
De Rosa V , Guillaro F , Poggi G , et al . Exploring the adversarial robustness of CLIP for AI-generated image detection [C ] // Proceedings of 2024 IEEE International Workshop on Information Forensics and Security . Piscataway : IEEE , 2024 : 10810719 . DOI: 10.1109/wifs61860.2024.10810719 http://dx.doi.org/10.1109/wifs61860.2024.10810719
Madry A , Makelov A , Schmidt L , et al . Towards deep learning models resistant to adversarial attacks [PP/OL ] . V4.arVix ( 2019-09-04 )[ 2026-02-27 ] . https://arxiv.org/abs/1706.06083 https://arxiv.org/abs/1706.06083 .
Moosavi‑Dezfooli S M , Fawzi A , Fawzi O , et al . Universal adversarial perturbations [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 86 - 94 . DOI: 10.1109/cvpr.2017.17 http://dx.doi.org/10.1109/cvpr.2017.17
Mavali S , Ricker J , Pape D , et al . Adversarial robustness of AI-generated image detectors in the real world [PP/OL ] . V3.arXiv ( 2024-10-02 )[ 2026-02-27 ] . https://arxiv.org/abs/2410.01574 https://arxiv.org/abs/2410.01574 .
Dong Chengdong , Kumar A , Liu Eryun . Think twice before detecting GAN-generated fake images from their spectral domain imprints [C ] // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 7855 - 7864 . DOI: 10.1109/cvpr52688.2022.00771 http://dx.doi.org/10.1109/cvpr52688.2022.00771
Hou Yang , Guo Qing , Huang Yihao , et al . Evading DeepFake detectors via adversarial statistical consistency [C ] // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 12271 - 12280 . DOI: 10.1109/cvpr52729.2023.01181 http://dx.doi.org/10.1109/cvpr52729.2023.01181
Jia Shuai , Ma Chao , Yao Taiping , et al . Exploring frequency adversarial attacks for face forgery detection [C ] // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 4093 - 4102 . DOI: 10.1109/cvpr52688.2022.00407 http://dx.doi.org/10.1109/cvpr52688.2022.00407
Zhou Ziyin , Sun Ke , Chen Zhongxi , et al . StealthDiffusion: Towards evading diffusion forensic detection through diffusion model [C ] // Proceedings of the 32nd ACM International Conference on Multimedia . New York : ACM , 2024 : 3627 - 3636 . DOI: 10.1145/3664647.3681535 http://dx.doi.org/10.1145/3664647.3681535
张世辉 , 张晓微 , 宋丹丹 , 等 . 基于逆扰动融合生成对抗网络的对抗样本防御方法 [J ] . 电子学报 , 2023 , 51 ( 4 ): 879 - 884 .
Zhang Shihui , Zhang Xiaowei , Song Dandan , et al . Adversarial example defense method based on inverse perturbation fusing generative adversarial network [J ] . Acta Electronica Sinica , 2023 , 51 ( 4 ): 879 - 884 . (in Chinese)
潘杰 , 刘波 , 邹筱瑜 . 基于特征异常检测与伪标签回归的无监督对抗域适应 [J ] . 电子学报 , 2025 , 53 ( 1 ): 128 - 140 .
Pan Jie , Liu Bo , Zou Xiaoyu . Feature anomaly detection and pseudo-label regression for adversarial domain adaptation [J ] . Acta Electronica Sinica , 2025 , 53 ( 1 ): 128 - 140 . (in Chinese)
刁云峰 , 姜凯超 , 郭丹 , 等 . 基于贝叶斯能量对抗后训练的黑盒对抗防御方法 [J ] . 中国科学: 信息科学 , 2025 , 55 ( 8 ): 1986 - 2001 . DOI: 10.1360/ssi-2024-0326 http://dx.doi.org/10.1360/ssi-2024-0326
Diao Yunfeng , Jiang Kaichao , Guo Dan , et al . Post-train black-box defense through energy-based Bayesian adversarial training [J ] . SCIENTIA SINICA Informationis , 2025 , 55 ( 8 ): 1986 - 2001 . (in Chinese) . DOI: 10.1360/ssi-2024-0326 http://dx.doi.org/10.1360/ssi-2024-0326
Zhang Hongyang , Yu Yaodong , Jiao Jiantao , et al . Theoretically principled trade‑off between robustness and accuracy [C ] // Proceedings of the 36th International Conference on Machine Learning (ICML) . Vienna : PMLR , 2019 : 7472 - 7482 .
Jin Gaojie , Yi Xinping , Wu Dengyu , et al . Randomized adversarial training via Taylor expansion [C ] // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 16447 - 16457 . DOI: 10.1109/cvpr52729.2023.01578 http://dx.doi.org/10.1109/cvpr52729.2023.01578
Jia Xiaojun , Zhang Yong , Wu Baoyuan , et al . LAS-AT: Adversarial training with learnable attack strategy [C ] // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 13388 - 13398 . DOI: 10.1109/cvpr52688.2022.01304 http://dx.doi.org/10.1109/cvpr52688.2022.01304
Diao Yunfeng , Zhai Naixin , Miao Changtao , et al . Vulnerabilities in AI-generated image detection: The challenge of adversarial attacks [PP/OL ] . V6.arXiv ( 2024-07-30 )[ 2026-02-27 ] . https://arxiv.org/abs/2407.20836 https://arxiv.org/abs/2407.20836 . DOI: 10.1109/tmm.2026.3682154 http://dx.doi.org/10.1109/tmm.2026.3682154
Pavlitska S , Eisen E , Zöllner J M . Towards adversarial robustness of model-level mixture-of-experts architectures for semantic segmentation [C ] // Proceedings of 2024 International Conference on Machine Learning and Applications (ICMLA) . Piscataway : IEEE , 2024 : 1460 - 1465 . DOI: 10.1109/icmla61862.2024.00226 http://dx.doi.org/10.1109/icmla61862.2024.00226
Pavlitska S , Fan Haixi , Ditschuneit K , et al . Robust experts: The effect of adversarial training on CNNs with sparse mixture-of-experts layers [C ] // Proceedings of the IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2025 : 251 - 260 . DOI: 10.1109/iccvw69036.2025.00032 http://dx.doi.org/10.1109/iccvw69036.2025.00032
Zhang Yihua , Cai Ruisi , Chen Tianlong , et al . Robust mixture-of-expert training for convolutional neural networks [C ] // Proceedings of the IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2023 : 90 - 101 . DOI: 10.1109/iccv51070.2023.00015 http://dx.doi.org/10.1109/iccv51070.2023.00015
Zhang Xu , Xu Kaidi , Hu Ziqing , et al . Optimizing robustness and accuracy in mixture of experts: A dual-model approach [PP/OL ] . V3.arVix ( 2025-05-27 )[ 2026-02-27 ] . https://arxiv.org/abs/2502.06832 https://arxiv.org/abs/2502.06832 .
Qin Zhenyue , Kim D , Gedeon T . Rethinking softmax with cross-entropy: Neural network classifier as mutual information estimator [PP/OL ] . V4.arXiv ( 2020-09-17 )[ 2026-02-27 ] . https://arxiv.org/abs/1911.10688v4 https://arxiv.org/abs/1911.10688v4 .
Rombach R , Blattmann A , Lorenz D , et al . High-resolution image synthesis with latent diffusion models [C ] // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 10674 - 10685 . DOI: 10.1109/cvpr52688.2022.01042 http://dx.doi.org/10.1109/cvpr52688.2022.01042
Yu F , Seff A , Zhang Y , et al . LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2015 . DOI: 10.48550/arXiv.1506.03365 http://dx.doi.org/10.48550/arXiv.1506.03365
Zhu Mingjian , Chen Hanting , Yan Qiangyu , et al . GenImage: A million-scale benchmark for detecting AI-generated image [C ] // Proceedings of the 37th International Conference on Neural Information Processing Systems . New York : Curran Associates Inc. , 2023 : 3398 . DOI: 10.52202/075280-3398 http://dx.doi.org/10.52202/075280-3398
Deng Jia , Dong Wei , Socher R , et al . ImageNet: A large-scale hierarchical image database [C ] // Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 248 - 255 . DOI: 10.1109/cvpr.2009.5206848 http://dx.doi.org/10.1109/cvpr.2009.5206848
Croce F , Hein M . Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks [PP/OL ] . V2.arVix ( 2020-08-04 )[ 2026-02-27 ] . https://arxiv.org/abs/2003.01690 https://arxiv.org/abs/2003.01690 . DOI: 10.1007/s11263-019-01213-0 http://dx.doi.org/10.1007/s11263-019-01213-0
Carlini N , Wagner D . Towards evaluating the robustness of neural networks [C ] // Proceedings of 2017 IEEE Symposium on Security and Privacy (SP) . Piscataway : IEEE , 2017 : 39 - 57 . DOI: 10.1109/sp.2017.49 http://dx.doi.org/10.1109/sp.2017.49
Croce F , Hein M . Minimally distorted adversarial examples with a fast adaptive boundary attack [PP/OL ] . V2. arVix ( 2020-07-20 )[ 2026-02-27 ] . https://arxiv.org/abs/1907.02044 https://arxiv.org/abs/1907.02044 .
Andriushchenko M , Croce F , Flammarion N , et al . Square attack: A query-efficient black-box adversarial attack via random search [M ] // Computer Vision - ECCV 2020 . Cham : Springer International Publishing , 2020 : 484 - 501 . DOI: 10.1007/978-3-030-58592-1_29 http://dx.doi.org/10.1007/978-3-030-58592-1_29
Koutlis C , Papadopoulos S . Leveraging representations from intermediate encoder-blocks for synthetic image detection [C ] // Proceedings of the 18th European Conference on Computer Vision (ECCV) . Heidelberg : Springer , 2024 : 394 - 411 . DOI: 10.1007/978-3-031-73220-1_23 http://dx.doi.org/10.1007/978-3-031-73220-1_23
Yan Zhiyuan , Wang Jiangming , Jin Peng , et al . Orthogonal subspace decomposition for generalizable AI-generated image detection [PP/OL ] . V4.arVix ( 2025-05-20 )[ 2026-02-27 ] . https://arxiv.org/abs/2411.15633 https://arxiv.org/abs/2411.15633 .
Nie Weili , Guo B , Huang Yujia , et al . Diffusion models for adversarial purification [PP/OL ] . V1.arVix ( 2022-05-16 )[ 2026-02-27 ] . https://arxiv.org/abs/2205.07460 https://arxiv.org/abs/2205.07460 .
0
浏览量
19
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621