1.陆军工程大学指挥控制工程学院,江苏南京 210007
2.陆军工程大学研究生院,江苏南京 210007
[ "张强 男,1991年2月出生于江苏省宿迁市.2024年获得陆军工程大学博士学位.现为陆军工程大学博士后研究员.主要研究方向为机器学习、模式识别、语音信号处理. E-mail: zq308297543@126.com" ]
[ "张雄伟 男,1965年11月出生于浙江省嘉兴市.1992年获得南京解放军通信工程学院通信与电子系统学科博士学位.现为陆军工程大学教授.主要研究方向为语音信号处理、智能信息处理、模式识别. E-mail: xwzhang9898@163.com" ]
[ "孙蒙 男,1984年12月出生于山东省德州市.2012年获得鲁汶大学电子工程系博士学位.现为陆军工程大学教授.主要研究方向为语音处理、无监督/半监督机器学习、序列模式识别. E-mail: sunmeng@aeu.edu.cn" ]
[ "杨吉斌 男,1978年8月出生于安徽省明光市.2006年获得中国人民解放军理工大学博士学位.现为陆军工程大学副教授.主要研究方向为语音信号处理、机器学习、模式识别. E-mail: yangjibin@aeu.edu.cn" ]
收稿:2024-12-16,
修回:2025-05-10,
纸质出版:2025-06-25
移动端阅览
张强, 张雄伟, 孙蒙, 等. 基于鲁棒对抗防御边界的语音伪造方法识别[J]. 电子学报, 2025, 53(06): 2022-2037.
ZHANG Qiang, ZHANG Xiong-wei, SUN Meng, et al. Robust Adversarial Defense Boundary-Based Speech Forgery Method Recognition[J]. Acta Electronica Sinica, 2025, 53(06): 2022-2037.
张强, 张雄伟, 孙蒙, 等. 基于鲁棒对抗防御边界的语音伪造方法识别[J]. 电子学报, 2025, 53(06): 2022-2037. DOI:10.12263/DZXB.20241128
ZHANG Qiang, ZHANG Xiong-wei, SUN Meng, et al. Robust Adversarial Defense Boundary-Based Speech Forgery Method Recognition[J]. Acta Electronica Sinica, 2025, 53(06): 2022-2037. DOI:10.12263/DZXB.20241128
深度伪造语音的反欺骗是生成式人工智能安全领域的一项重要技术.除了对真实语音和伪造语音进行二元分类外,语音伪造方法识别正在成为可解释的反欺骗策略的重要组成部分.但为了逃避对语音伪造方法的识别,攻击者很有可能利用对抗样本攻击技术,在伪造语音中加入人耳无法感知的对抗扰动,来降低语音伪造方法识别(Speech Forgery Method Recognition,SFMR)模型的准确性.针对SFMR所面临的对抗样本攻击问题,从防御者的角度出发,提出了对抗防御边界概念.基于此,使用泰勒分析技术,理论分析了网络随机性和决策边界距离对模型对抗鲁棒性的影响,并提出了基于鲁棒对抗防御边界(Robust Adversarial Defense Boundary,RADB)的SFMR算法.该算法采用随机变换(Random Transform, RT)和决策边界距离正则化(Decision Boundary Distance Regularization,DBDR)两个模块实现鲁棒对抗防御.RT模块通过模拟真实世界场景中伪造语音可能受到的干扰,在训练和推理时,均对输入语音进行随机组合变换,利用随机性提高对抗鲁棒性.DBDR模块引入决策边界距离正则化损失函数,鼓励模型提高对抗鲁棒性上限,降低模型的类别预测关于对抗扰动的敏感性.在典型SFMR数据集,即中文伪造音频检测(Chinese Fake Audio Detection,CFAD)数据集和2019年自动说话人验证欺骗与对策挑战赛(2019 Automatic Speaker Verification spoofing and countermeasures challenge,ASVspoof2019)数据集上的实验结果表明,在对抗攻击条件下,与现有先进基线方法相比,所提算法能够将SFMR准确率分别提高5.63%、5.95%,至93.98%、91.71%.
Anti-spoofing of deeply forged speech is an important technique in the field of generative artificial intelligence (AI) security. In addition to binary classification of real and forged speech
speech forgery method recognition is becoming an important part of interpretable anti-spoofing strategies. To evade the recognition of the speech forgery method
attackers are likely to utilize the adversarial attack technique to degrade the accuracy of the speech forgery method recognition (SFMR) model by adding adversarial perturbations that are imperceptible to the human ear into the forged speech. To address this problem of adversarial attack faced by SFMR
the concept of adversarial defense boundary is proposed from the defender’s point of view. Based on this
the effect of network randomness and decision boundary distance on model adversarial robustness is theoretically analyzed using Taylor analysis techniques
and the robust adversarial defense boundary(RADB)-based SFMR algorithm is proposed. Two modules
random transform (RT) and decision boundary distance regularization (DBDR)
are adopted by the algorithm to realize robust adversarial defense. The RT module improves the adversarial robustness by simulating the possible interference of forged speech in real-world scenarios
and randomly transforming the input speech during both training and inference to take advantage of the randomness. The DBDR module introduces the decision boundary distance regularization loss function to encourage the model to increase the upper bound of adversarial robustness and reduce the sensitivity of the model’s class prediction regarding the adversarial perturbation. Experimental results on typical SFMR datasets
i.e.
Chinese fake audio detection(CFAD) and 2019 automatic speaker verification spoofing and countermeasures challenge (ASVspoof2019)
show that compared with existing state-of-the-art baseline methods
the proposed algorithm is able to improve the SFMR accuracy under adversarial attacks by 5.63% and 5.95% to 93.98% and 91.71%
respectively.
MASOOD M , NAWAZ M , MALIK K M , et al . Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward [J ] . Applied Intelligence , 2023 , 53 ( 4 ): 3974 - 4026 .
李旭嵘 , 纪守领 , 吴春明 , 等 . 深度伪造与检测技术综述 [J ] . 软件学报 , 2021 , 32 ( 2 ): 496 - 518 .
LI X R , JI S L , WU C M , et al . Survey on deepfakes and detection techniques [J ] . Journal of Software , 2021 , 32 ( 2 ): 496 - 518 . (in Chinese)
MÜLLER N , DIEKMANN F , WILLIAMS J . Attacker attribution of audio deepfakes [C ] // Interspeech 2022 . Singapore : ISCA , 2022 : 2788 - 2792 .
NERI M , FERRAROTTI A , DE LUISA L , et al . ParalMGC: Multiple audio representations for synthetic human speech attribution [C ] // 2022 10th European Workshop on Visual Information Processing (EUVIP) . Piscataway : IEEE , 2022 : 1 - 6 .
DENG J L , REN Y Z , ZHANG T , et al . VFD-net: Vocoder fingerprints detection for fake audio [C ] // 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE , 2024 : 12151 - 12155 .
XIE Y K , LU Y , FU R B , et al . The codecfake dataset and countermeasures for the universally detection of deepfake audio [J ] . IEEE Transactions on Audio, Speech and Language Processing , 2025 , 33 : 386 - 400 .
田野 , 罗曦 , 许斌 , 等 . 一种基于预训练模型的语音深度伪造算法识别方法 [J ] . 电声技术 , 2024 , 48 ( 2 ): 28 - 31, 35 .
TIAN Y , LUO X , XU B , et al . A pre-trained model based recognition method for speech deepfake algorithms [J ] . Audio Engineering , 2024 , 48 ( 2 ): 28 - 31, 35 . (in Chinese)
ZHU T L , WANG X M , QIN X Y , et al . Source tracing: Detecting voice spoofing [C ] // 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) . Piscataway : IEEE , 2022 : 216 - 220 .
SALVI D , BESTAGINI P , TUBARO S . Exploring the synthetic speech attribution problem through data-driven detectors [C ] // 2022 IEEE International Workshop on Information Forensics and Security (WIFS) . Piscataway : IEEE , 2022 : 1 - 6 .
YAN X R , YI J Y , TAO J H , et al . An initial investigation for detecting vocoder fingerprints of fake audio [C ] // Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia . New York : ACM , 2022 : 61 - 68 .
YADAV A K S , BARTUSIAK E R , BHAGTANI K , et al . Synthetic speech attribution using self supervised audio spectrogram transformer [J ] . Electronic Imaging , 2023 , 35 ( 4 ): 1 - 11 .
BARTUSIAK E R , DELP E J . Transformer-based speech synthesizer attribution in an open set scenario [C ] // 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) . Piscataway : IEEE , 2022 : 329 - 336 .
KLEIN N , CHEN T X , TAK H , et al . Source tracing of audio deepfake systems [C ] // Interspeech 2024 . Singapore : ISCA , 2024 : 1 - 5 .
QIAN Z , HUANG K Z , WANG Q F , et al . A survey of robust adversarial training in pattern recognition: Fundamental, theory, and methodologies [J ] . Pattern Recognition , 2022 , 131 : 108889 .
刘延华 , 李嘉琪 , 欧振贵 , 等 . 对抗训练驱动的恶意代码检测增强方法 [J ] . 通信学报 , 2022 , 43 ( 9 ): 169 - 180 .
LIU Y H , LI J Q , OU Z G , et al . Adversarial training driven malicious code detection enhancement method [J ] . Journal on Communications , 2022 , 43 ( 9 ): 169 - 180 . (in Chinese)
曹刘娟 , 匡华峰 , 刘弘 , 等 . 双标签监督的几何约束对抗训练 [J ] . 软件学报 , 2022 , 33 ( 4 ): 1218 - 1230 .
CAO L J , KUANG H F , LIU H , et al . Towards robust adversarial training via dual-label supervised and geometry constraint [J ] . Journal of Software , 2022 , 33 ( 4 ): 1218 - 1230 . (in Chinese)
ZHANG Z R , HAO W , SANKOH A , et al . I can hear you: Selective robust training for deepfake audio detection [EB/OL ] . ( 2024-10-31 )[ 2025-05-12 ] . https://arxiv.org/abs/2411.00121v1 https://arxiv.org/abs/2411.00121v1 .
XIE C , ZHANG Z , YUILLE A L , et al . Mitigating adversarial effects through randomization [C ] // International Conference on Learning Representations . Washington DC : ICLR , 2018 : 10 - 25 .
ZHANG Y C , LIANG P . Defending against whitebox adversarial attacks via randomized discretization [EB/OL ] . ( 2019-05-25 )[ 2025-05-12 ] . https://arxiv.org/abs/1903.10586v1 https://arxiv.org/abs/1903.10586v1 .
RAFF E , SYLVESTER J , FORSYTH S , et al . Barrage of random transforms for adversarially robust defense [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 6521 - 6530 .
FANG K , TAO Q H , WU Y W , et al . Towards robust neural networks via orthogonal diversity [J ] . Pattern Recognition , 2024 , 149 : 110281 .
张世辉 , 张晓微 , 宋丹丹 , 等 . 基于逆扰动融合生成对抗网络的对抗样本防御方法 [J ] . 电子学报 , 2023 , 51 ( 4 ): 879 - 884 .
ZHANG S H , ZHANG X W , SONG D D , et al . Adversarial example defense method based on inverse perturbation fusing generative adversarial network [J ] . Acta Electronica Sinica , 2023 , 51 ( 4 ): 879 - 884 . (in Chinese)
MENG H , OU W , HUANG J , et al . A robust unified spoofing audio detection scheme [J ] . Computers and Electrical Engineering , 2025 , 122 : 109974 .
LI Y Z , ZHANG C , QI H G , et al . AdaNI: Adaptive noise injection to improve adversarial robustness [J ] . Computer Vision and Image Understanding , 2024 , 238 : 103855 .
LIU Z Y , GAGNON G , VENKATARAMANI S , et al . Enhance DNN adversarial robustness and efficiency via injecting noise to non-essential neurons [EB/OL ] . ( 2024-12-06 )[ 2025-05-12 ] . https://arxiv.org/abs/2402.04325 https://arxiv.org/abs/2402.04325 .
ROSS A , DOSHI-VELEZ F . Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2018 , 32 ( 1 ): 1660 - 1669 .
BENDER C M , LI Y , SHI Y F , et al . Defense through diverse directions [EB/OL ] . ( 2020-05-24 )[ 2025-05-12 ] . https://arxiv.org/abs/2003.10602v1 https://arxiv.org/abs/2003.10602v1 .
WU Y W , CHEN S Z , FANG K , et al . Unifying gradients to improve real-world robustness for deep networks [J ] . ACM Transactions on Intelligent Systems and Technology , 2023 , 14 ( 6 ): 1 - 16 .
LIU X C , WANG X , SAHIDULLAH M , et al . ASVspoof 2021: Towards spoofed and deepfake speech detection in the wild [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2023 , 31 : 2507 - 2522 .
YI J , TAO J , FU R , et al . ADD 2023: The second audio deepfake detection challenge [C ] // International Joint Conference on Artificial Intelligence . Freiburg : IJCAI , 2023 : 125 - 130 .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .
KHOSLA P , TETERWAK P , WANG C , et al . Supervised contrastive learning [J ] . Advances in Neural Information Processing Systems , 2020 , 33 : 18661 - 18667 .
GOODFELLOW I J , SHLENS J , SZEGEDY C . Explaining and harnessing adversarial examples [C ] // International Conference on Learning Representations . San Diego : ICLR , 2015 : 10 - 21 .
MADRY A , MAKELOV A , SCHMIDT L , et al . Towards deep learning models resistant to adversarial attacks [C ] // International Conference on Learning Representations . San Diego : ICLR , 2018 : 10 - 37 .
KURAKIN A , GOODFELLOW I J , BENGIO S . Adversarial Examples in the Physical World [M ] //Artificial Intelligence Safety and Security. First edition . | Boca Raton, FL : CRC Press/Taylor Francis Group , 2018 : Chapman and Hall/CRC, 2018: 99 - 112 .
SCHWINN L , RAAB R , NGUYEN A , et al . Exploring misclassifications of robust neural networks to enhance adversarial attacks [J ] . Applied Intelligence , 2023 , 53 ( 17 ): 19843 - 19859 .
ATHALYE A , ENGSTROM L , ILYAS A , et al . Synthesizing robust adversarial examples [C ] // International Conference on Machine Learning . New York : ICML , 2018 : 449 - 468 .
SITAWARIN C , GOLAN-STRIEB Z , WAGNER D . Demystifying the adversarial robustness of random transformation defenses [EB/OL ] . ( 2020-07-15 )[ 2025-05-12 ] . https://arxiv.org/abs/2207.03574v2 https://arxiv.org/abs/2207.03574v2 .
XING Y , SONG Q , CHENG G . On the generalization properties of adversarial training [J ] . Proceedings of Machine Learning Research , 2021 , 130 : 505 - 513 .
ZHANG H , CHEN H G , SONG Z , et al . The limitations of adversarial training and the blind-spot attack [EB/OL ] . ( 2019-01-15 )[ 2025-05-12 ] . https://arxiv.org/abs/1901.04684v1 https://arxiv.org/abs/1901.04684v1 .
HE Z Z , RAKIN A S , FAN D L . Parametric noise injection: Trainable randomness to improve deep neural network robustness against adversarial attack [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 588 - 597 .
JEDDI A , SHAFIEE M J , KARG M , et al . Learn2Perturb: An end-to-end feature perturbation learning to improve adversarial robustness [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . IEEE , 2020 : 1238 - 1247 .
EUSTRATIADIS P , GOUK H , LI D , et al . Weight-covariance alignment for adversarially robust neural networks [C ] // International Conference on Machine Learning . New York : ICML , 2021 : 3047 - 3056 .
LEE S , KIM H , LEE J . GradDiv: Adversarial robustness of randomized neural networks via gradient diversity regularization [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 2 ): 2645 - 2651 .
KESAVAN H K , KAPUR J N . The generalized maximum entropy principle [J ] . IEEE Transactions on Systems , Man Cybernetics, 1989 , 19 ( 5 ): 1042 - 1052 .
KO T , PEDDINTI V , POVEY D , et al . A study on data augmentation of reverberant speech for robust speech recognition [C ] // 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE , 2017 : 5220 - 5224 .
SNYDER D , CHEN G G , POVEY D . MUSAN: A music, speech, and noise corpus [EB/OL ] . ( 2015-10-28 )[ 2025-05-12 ] . https://arxiv.org/abs/1510.08484 https://arxiv.org/abs/1510.08484 .
MA H X , YI J Y , WANG C L , et al . CFAD: A Chinese dataset for fake audio detection [J ] . Speech Communication , 2024 , 164 : 103122 .
WANG X , YAMAGISHI J , TODISCO M , et al . ASVspoof2019: A large-scale public database of synthesized, converted and replayed speech [J ] . Computer Speech Language , 2020 , 64 : 101114 .
SHEN J , PANG R M , WEISS R J , et al . Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions [C ] // 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE , 2018 : 4779 - 4783 .
VAN DEN OORD A , DIELEMAN S , ZEN H G , et al . WaveNet: A generative model for raw audio [EB/OL ] . ( 2016-09-19 )[ 2025-05-12 ] . https://arxiv.org/abs/1609.03499v2 https://arxiv.org/abs/1609.03499v2 .
DENG J , DONG W , SOCHER R , et al . ImageNet: A large-scale hierarchical image database [C ] // 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 248 - 255 .
SZEGEDY C , VANHOUCKE V , IOFFE S , et al . Rethinking the inception architecture for computer vision [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 2818 - 2826 .
KINGMA D P , BA J L . Adam: A method for stochastic optimization [C ] // 3rd International Conference on Learning Representations . Vienna : ICLR , 2015 : 1 - 15 .
CARLINI N , WAGNER D . Towards evaluating the robustness of neural networks [C ] // 2017 IEEE Symposium on Security and Privacy (SP) . Piscataway : IEEE , 2017 : 39 - 57 .
SU J W , VARGAS D V , SAKURAI K . One pixel attack for fooling deep neural networks [J ] . IEEE Transactions on Evolutionary Computation , 2019 , 23 ( 5 ): 828 - 841 .
ANDRIUSHCHENKO M , CROCE F , FLAMMARION N , et al . Square Attack: A query-efficient black-box adversarial attack via random search [EB/OL ] . ( 2019-11-29 )[ 2025-05-12 ] . https://arxiv.org/abs/1912.00049 https://arxiv.org/abs/1912.00049 .
VAN DER MAATEN L , HINTON G . Visualizing data using t-SNE [J ] . Journal of Machine Learning Research , 2008 , 9 ( 11 ): 2579 - 2605 .
ATHALYE A , CARLINI N , WAGNER D . Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples [C ] // International Conference on Machine Learning . New York : ICML , 2018 : 274 - 283 .
HU G N , WANG D L . A tandem algorithm for pitch estimation and voiced speech segregation [J ] . IEEE Transactions on Audio, Speech, and Language Processing , 2010 , 18 ( 8 ): 2067 - 2079 .
VARGA A , STEENEKEN H J M . Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems [J ] . Speech Communication , 1993 , 12 ( 3 ): 247 - 251 .
MESAROS A , HEITTOLA T , VIRTANEN T . A multi-device dataset for urban acoustic scene classification [EB/OL ] . ( 2018-10-11 )[ 2025-05-12 ] . https://arxiv.org/abs/1807.09840v2 https://arxiv.org/abs/1807.09840v2 .
0
浏览量
6
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621