Evaluation Metrics for Adversarial Robustness Based on the Smoothness of Decision Boundary in Deep Learning Models

WU Tao; WANG Jun-jie; CAO Xin-wen; WANG Lian; XIAN Xing-ping; ZHANG Rui-kang

doi:10.12263/DZXB.20240932

您当前的位置：

首页 >

文章列表页 >

Evaluation Metrics for Adversarial Robustness Based on the Smoothness of Decision Boundary in Deep Learning Models

PAPERS | 更新时间：2025-10-16

- Evaluation Metrics for Adversarial Robustness Based on the Smoothness of Decision Boundary in Deep Learning Models
- ACTA ELECTRONICA SINICA Vol. 53, Issue 6, Pages: 2090-2103(2025)
- 作者机构：
  
  重庆邮电大学网络空间安全与信息法学院，重庆 400065
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62376047;62106030);Chongqing Natural Science Foundation Innovation and Development Joint Fund Key Project(CSTB2023NSCQ-LZX0003);Chongqing Municipal Education Commission Science and Technology Research Program Key Project(KJZD-K202300603);Chongqing Technological Innovation and Application Development Project(CSTB2022TIAD-GPX0014);China University Industry-University-Research Innovation Fund Project(2022BL105)
- DOI：10.12263/DZXB.20240932
  CLC： TP391;
- Received：16 October 2024，
  
  Revised：2025-03-15，
  
  Published：25 June 2025
- 稿件说明：
移动端阅览
吴涛, 汪俊杰, 曹新汶, 等. 基于决策边界光滑度的深度学习模型对抗鲁棒性评估指标[J]. 电子学报, 2025, 53(06): 2090-2103.

WU Tao, WANG Jun-jie, CAO Xin-wen, et al. Evaluation Metrics for Adversarial Robustness Based on the Smoothness of Decision Boundary in Deep Learning Models[J]. Acta Electronica Sinica, 2025, 53(06): 2090-2103.
吴涛, 汪俊杰, 曹新汶, 等. 基于决策边界光滑度的深度学习模型对抗鲁棒性评估指标[J]. 电子学报, 2025, 53(06): 2090-2103. DOI：10.12263/DZXB.20240932

WU Tao, WANG Jun-jie, CAO Xin-wen, et al. Evaluation Metrics for Adversarial Robustness Based on the Smoothness of Decision Boundary in Deep Learning Models[J]. Acta Electronica Sinica, 2025, 53(06): 2090-2103. DOI：10.12263/DZXB.20240932

摘要

深度学习模型的对抗鲁棒性对于可信人工智能发展至关重要.研究领域广泛采用对抗攻击方法间接评价模型的对抗鲁棒性，然而此类方式依赖具体的对抗攻击方法和对抗扰动程度，无法反映模型的本质特征.同时，仅有的少数直接进行模型对抗鲁棒性评价的评估指标要求对抗扰动的先验知识或者假设训练数据服从特定分布，适用性不强.基于此，从模型自身特性出发，本文提出一种简单有效的、基于决策边界光滑度的对抗鲁棒性评估指标DBSE（Decision Boundary Shannon Entropy）.此方法利用对抗鲁棒性与决策边界光滑性之间的相关性，提出用于获取边界样本以近似刻画模型实际决策边界的“决策空间搜索策略”.然后，利用奇异值分解提取近似决策边界空间结构信息，并采用香农熵进行分布的均匀性量化，从而形成对抗鲁棒性评估指标DBSE.实验结果表明，DBSE与代表性评估指标ASR（Attack Success Rate）、EBD（Empirical Boundary Distance）、ACTC（Average Confidence of True Class）、ACAC（Average Confidence of Adversarial Class）、MP（Minimal Perturbation）和ROBY相比，在独立性、有效性和时效性方面具有更好的表现，且不依赖对抗攻击方法，在时间开销方面比EBD减少了55%.

Abstract

The adversarial robustness of deep learning models is crucial for the development of trustworthy artificial intelligence. The research field widely adopts adversarial attack methods to indirectly evaluate the adversarial robustness of models. However

such methods rely on specific adversarial attack methods and levels of adversarial perturbations

failing to reflect the essential characteristics of models. Meanwhile

the few existing indicators that directly assess model adversarial robustness require prior knowledge of adversarial perturbations or assume that training data follows a specific distribution

limiting their applicability. In response to these challenges

starting from the intrinsic characteristics of models

this paper proposes a simple and effective adversarial robustness evaluation metric

DBSE. This method exploits the correlation between adversarial robustness and decision boundary smoothness

proposing a decision boundary sample sampling strategy to approximate and characterize the actual decision boundary of the models by obtaining samples about the decision boundary. Then

singular value decomposition is used to extract spatial structural information of the decision boundary

and Shannon entropy is employed to quantify the distribution of variations in various directions

thereby forming the adversarial robustness evaluation metric DBSE. Experimental results demonstrate that DBSE outperforms representative evaluation metrics such as ASR(Attack Success Rate)

EBD(Empirical Boundary Distance)

ACTC(Average Confidence of True Class)

ACAC(Average Confidence of Adversarial Class)

MP(Minimal Perturbation) and ROBY in terms of independence

effectiveness

and efficiency

and reduces time consumption by 55% compared to EBD.

关键词

Keywords

references

SZEGEDY C . Intriguing properties of neural networks [C ] // Proceedings of the International Conference on Learning Representations . Banff : ICLR , 2014 : 1 - 10 .

LIU J , JIN H Y , XU G X , et al . Aliasing black box adversarial attack with joint self-attention distribution and confidence probability [J ] . Expert Systems with Applications , 2023 , 214 : 119110 .

WU T , WANG X C , QIAO S J , et al . Small perturbations are enough: Adversarial attacks on time series prediction [J ] . Information Sciences , 2022 , 587 : 794 - 812 .

XIAN X P , WU T , QIAO S J , et al . DeepEC: Adversarial attacks against graph structure prediction models [J ] . Neurocomputing , 2021 , 437 : 168 - 185 .

李明慧 , 江沛佩 , 王骞 , 等 . 针对深度学习模型的对抗性攻击与防御 [J ] . 计算机研究与发展 , 2021 , 58 ( 5 ): 909 - 926 .

LI M H , JIANG P P , WANG Q , et al . Adversarial attacks and defenses for deep learning models [J ] . Journal of Computer Research and Development , 2021 , 58 ( 5 ): 909 - 926 . (in Chinese)

吴翼腾 , 刘伟 , 于溆乔 . 基于参数差异假设的图卷积网络对抗性攻击 [J ] . 电子学报 , 2023 , 51 ( 2 ): 330 - 341 .

WU Y T , LIU W , YU X Q . Adversarial attacks on graph convolution networks based on parameter discrepancy hypothesis [J ] . Acta Electronica Sinica , 2023 , 51 ( 2 ): 330 - 341 . (in Chinese)

周侠 , 张剑 , 李宁安 . 基于显著图的电磁信号对抗样本生成方法 [J ] . 电子学报 , 2023 , 51 ( 7 ): 1917 - 1928 .

ZHOU X , ZHANG J , LI N A . An electromagnetic signal adversarial examples generation method based on saliency map [J ] . Acta Electronica Sinica , 2023 , 51 ( 7 ): 1917 - 1928 . (in Chinese)

GUO J , BAO W , WANG J K , et al . A comprehensive evaluation framework for deep model robustness [J ] . Pattern Recognition , 2023 , 137 : 109308 .

王科迪 , 易平 . 人工智能对抗环境下的模型鲁棒性研究综述 [J ] . 信息安全学报 , 2020 , 5 ( 3 ): 13 - 22 .

WANG K D , YI P . A survey on model robustness under adversarial example [J ] . Journal of Cyber Security , 2020 , 5 ( 3 ): 13 - 22 . (in Chinese)

MOOSAVI-DEZFOOLI S M , FAWZI A , FROSSARD P . DeepFool: A simple and accurate method to fool deep neural networks [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 2574 - 2582 .

WENG T W , ZHANG H , CHEN P Y , et al . Evaluating the robustness of neural networks: An extreme value theory approach [EB/OL ] . ( 2018-01-31 )[ 2024-12-01 ] . https://arxiv.org/abs/1801.10578v1 https://arxiv.org/abs/1801.10578v1 .

TIAN J Y , ZHOU J T , LI Y M , et al . Detecting adversarial examples from sensitivity inconsistency of spatial-transform domain [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2021 , 35 ( 11 ): 9877 - 9885 .

PEI K X , CAO Y Z , YANG J F , et al . DeepXplore: Automated whitebox testing of deep learning systems [C ] // Proceedings of the 26th Symposium on Operating Systems Principles . New York : ACM , 2017 : 1 - 18 .

MARCHETTI M , HO E S L . Improving Deep Learning Model Robustness Against Adversarial Attack by Increasing the Network Capacity [M ] // Advances in Cybersecurity, Cybercrimes, and Smart Emerging Technologies . Cham : Springer International Publishing , 2023 : 85 - 96 .

LING X , JI S L , ZOU J X , et al . DEEPSEC: A uniform platform for security analysis of deep learning model [C ] // 2019 IEEE Symposium on Security and Privacy (SP) . Piscataway : IEEE , 2019 : 673 - 690 .

李自拓 , 孙建彬 , 杨克巍 , 等 . 面向图像分类的对抗鲁棒性评估综述 [J ] . 计算机研究与发展 , 2022 , 59 ( 10 ): 2164 - 2189 .

LI Z T , SUN J B , YANG K W , et al . A review of adversarial robustness evaluation for image classification [J ] . Journal of Computer Research and Development , 2022 , 59 ( 10 ): 2164 - 2189 . (in Chinese)

BASTANI O , IOANNOU Y , LAMPROPOULOS L , et al . Measuring neural net robustness with constraints [C ] // Advances in neural information processing systems . Barcelona : NeurIPS , 2016 : 29 - 37 .

ZHANG H , WENG T W , CHEN P Y , et al . Efficient neural network robustness certification with general activation functions [C ] // Advances in neural information processing systems . Montréal : NeurIPS , 2018 : 4939 - 4948 .

ZHANG C Z , LIU A S , LIU X L , et al . Interpreting and improving adversarial robustness of deep neural networks with neuron sensitivity [J ] . IEEE Transactions on Image Processing , 2020 , 30 : 1291 - 1304 .

LIU A S , LIU X L , YU H , et al . Training robust deep neural networks via adversarial noise propagation [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 5769 - 5781 .

JIN H B , CHEN J Y , ZHENG H B , et al . ROBY: Evaluating the adversarial robustness of a deep model by its decision boundaries [J ] . Information Sciences , 2022 , 587 : 97 - 122 .

KARIMI H , DERR T , TANG J L . Characterizing the decision boundary of deep neural networks [EB/OL ] . ( 2020-01-03 )[ 2024-12-01 ] . https://arxiv.org/abs/1912.11460v3 https://arxiv.org/abs/1912.11460v3 .

FAWZI A , MOOSAVI-DEZFOOLI S M , FROSSARD P . The robustness of deep networks: A geometrical perspective [J ] . IEEE Signal Processing Magazine , 2017 , 34 ( 6 ): 50 - 62 .

HE W , LI B , SONG D . Decision boundary analysis of adversarial examples [C ] // International Conference on Learning Representations . Vancouver : ICLR , 2018 : 1 - 15 .

YU F X , QIN Z W , LIU C C , et al . Interpreting and evaluating neural network robustness [C ] // Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence . Macao : IJCAI , 2019 : 4199 - 4205 .

LEI S Y , HE F X , YUAN Y C , et al . Understanding deep learning via decision boundary [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2025 , 36 ( 1 ): 1533 - 1544 .

YAN J , YIN H L , ZHAO Z M , et al . Enhance adversarial robustness via geodesic distance [J ] . IEEE Transactions on Artificial Intelligence , 2024 , 5 ( 8 ): 4202 - 4216 .

CHEN C , ZHANG J F , XU X L , et al . Decision boundary-aware data augmentation for adversarial training [J ] . IEEE Transactions on Dependable and Secure Computing , 2023 , 20 ( 3 ): 1882 - 1894 .

KANBAK C , MOOSAVI-DEZFOOLI S M , FROSSARD P . Geometric robustness of deep networks: Analysis and improvement [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4441 - 4449 .

GOODFELLOW I J , SHLENS J , SZEGEDY C , et al . Explaining and harnessing adversarial examples [EB/OL ] . ( 2015-05-20 )[ 2024-12-01 ] . https://arxiv.org/abs/1412.6572v3 https://arxiv.org/abs/1412.6572v3 .

KURAKIN A , GOODFELLOW I , BENGIO S . Adversarial examples in the physical world [EB/OL ] . ( 2017-02-11 )[ 2024-12-01 ] . https://arxiv.org/abs/1607.02533v4 https://arxiv.org/abs/1607.02533v4 .

MADRY A , MAKELOV A , SCHMIDT L , et al . Towards deep learning models resistant to adversarial attacks [C ] // 6th International Conference on Learning Representations . Vancouver : ICLR , 2018 : 1 - 23 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

A Multi-Layer Feature Importance Attack Method Based on Iterative Accumulated Gradients

Related Author

ZHNAG Rui-kang

WU Ji

SHAO Wen-ze

GE Qi

SUN Yu-bao

Related Institution

School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications

Engineering Research Center for Digital Forensics Ministry of Education, Nanjing University of Information Science and Technology

⁰