

浏览全部资源
扫码关注微信
1.中国科学院大学,北京 100049
2.中国科学院软件研究所,北京 100190
3.基础软件与系统重点实验室(中国科学院),北京 100049
4.中兴通讯股份有限公司,广东深圳 518057
5.东南大学网络空间安全学院,江苏南京 211189
Received:10 July 2025,
Accepted:16 March 2026,
Published:25 March 2026
移动端阅览
李政浩, 闫新成, 王继刚, 等. 面向API交互的Python语义化模糊测试方案[J]. 电子学报, 2026, 54(03): 1280-1295.
LI Zhenghao, YAN Xincheng, WANG Jigang, et al. Python Semantic Fuzzing Solution for API Interaction[J]. Acta Electronica Sinica, 2026, 54(03): 1280-1295.
李政浩, 闫新成, 王继刚, 等. 面向API交互的Python语义化模糊测试方案[J]. 电子学报, 2026, 54(03): 1280-1295. DOI:10.12263/DZXB.20250614
LI Zhenghao, YAN Xincheng, WANG Jigang, et al. Python Semantic Fuzzing Solution for API Interaction[J]. Acta Electronica Sinica, 2026, 54(03): 1280-1295. DOI:10.12263/DZXB.20250614
Python第三方库在现代软件生态中被广泛应用,其面临的安全威胁也变得日益严重,如Python包索引平台(Python Package Index,PyPI)中漏洞数量持续快速上升、依赖网络高度复杂,导致单一漏洞易通过依赖链波及大量下游项目。现有模糊测试工具在测试Python第三方库时,复杂的应用程序编程接口(Application Programming Interface,API)交互场景下接口间的隐式数据流关系会导致探索能力不足;Python语言自身的动态特性,例如鸭子类型及反射机制进一步降低了静态分析的准确性,导致对多重条件保护的复杂约束探索能力低下,使得漏洞发现能力受限。此外,传统的模糊测试引擎往往采用随机变异的策略,因而无法针对测试目标进行定制化变异,使得测试资源大量消耗在无状态依赖的低价值浅层路径上,难以探索深层代码漏洞。为此,本文提出PyBoros——一种针对Python第三方库的高效模糊测试框架。该框架通过构建包级API依赖图,并结合团渗流方法(Clique Percolation Method,CPM)进行子图分割,以精准捕获隐式依赖;在此基础上,使用大语言模型生成语义丰富的初始模糊测试驱动以捕捉隐式依赖;采用停滞触发式非阻塞动态分析,即以覆盖率增长停滞作为信号,按需捕获运行时代码状态与变量快照等约束上下文信息,并利用大语言模型进行智能约束推理以产生突破性种子;在模糊测试的过程中,通过引入API n-gram覆盖指导与分支价值评分相结合的资源调度策略,引导测试资源优先向高价值路径探索。我们在平均Github Star数超过2 000的10个真实Python第三方库中进行了测试,PyBoros发现20个真实漏洞(其中10个为0-day),漏洞检出数量较Atheris提高100%;边覆盖率较Atheris提高8.57%;初始生成的模糊测试驱动的API覆盖数达到Fuzz4All的1.8倍;对四种大语言模型(包含两种开源、两种闭源)生成的模糊测试驱动平均接受率为72.6%;在额外的扰动实验中,模糊测试驱动的接受率即使在10%静态分析扰动攻击场景也仅下降了3.8%,仍保持较高鲁棒性。总体而言,PyBoros模糊测试框架为Python第三方库的安全分析提供了一种高效实用的方法。
Third-party Python libraries are extensively utilized in modern software ecosystems
and the security threats they face have become increasingly severe. As the number of vulnerabilities in Python package index (PyPI) continues to surge and dependency networks grow highly complex
a single vulnerability can easily propagate through dependency chains and affect a large number of downstream projects. When testing third-party Python libraries
existing fuzzing tools suffer from insufficient exploration capability due to the implicit data flow relationships between interfaces under complex application programming interface (API) interaction scenarios. Meanwhile
Python’s inherent dynamic features—such as duck typing and reflection—further reduce the accuracy of static analysis. This results in weak exploration of complex constraints protected by multiple conditional checks and ultimately limits vulnerability discovery capabilities. Traditional fuzzing engines often adopt random mutation strategies
making them incapable of performing targeted customized mutations. As a result
testing resources are heavily wasted on stateless
low-value shallow paths
which severely hinders the exploration of deep code vulnerabilities. To address these challenges
this paper presents PyBoros
an efficient fuzzing framework tailored for third-party Python libraries. The framework constructs an package-level API dependency graph and employs the clique percolation method (CPM) for subgraph partitioning. On this basis
it leverages large language models to generate semantically rich initial harnesses that capture implicit dependencies; adopts a stagnation-triggered non-blocking dynamic analysis
which takes the stagnation of coverage growth as a signal to capture constraint context information such as runtime code states and variable snapshots on demand
and utilizes large language models for intelligent constraint reasoning to generate breakthrough seeds; and introduces a resource scheduling strategy that integrates API n-gram coverage guidance with branch value scoring to prioritize the exploration of high-value paths during the fuzzing process. We evaluated PyBoros on 10 real-world third-party Python libraries with an average GitHub star count exceeding 2 000. The results demonstrate that PyBoros discovered 20 real vulnerabilities (10 of which are 0-day)
representing a 100% increase in vulnerability detection over Atheris. It also achievs an 8.57% improvement in edge coverage compared to Atheris. The API coverage of the initially generated harnesses reaches 1.8 × that of Fuzz4All. Across four large language models(two open-source and two closed-source)
the average acceptance rate of the generated harnesses is 72.6%. In additional robustness experiments
even under 10% static-analysis perturbation attacks
the acceptance rate of the harnesses drops by only 3.8%
maintaining strong robustness. Overall
PyBoros provides an effective approach for the security analysis of third-party Python libraries.
Github . Programming languages-GitHub innovation graph [EB/OL ] . ( 2023-09-21 )[ 2025-07-02 ] . https://innovationgraph.github.com/global-metrics/programming-languages https://innovationgraph.github.com/global-metrics/programming-languages .
Birney R . Lessons from the recent PyTorch supply chain attack [EB/OL ] . ( 2025-05-21 )[ 2026-02-10 ] . https://www.getsafety.com/blog-posts/lessons-from-the-recent-pytorch-supply-chain-attack https://www.getsafety.com/blog-posts/lessons-from-the-recent-pytorch-supply-chain-attack .
REVERSINGLABS . The 2025 software supply chain security report [R/OL ] . ( 2025-03-17 )[ 2026-02-10 ] . https://ntsc.org/wp-content/uploads/2025/03/The-2025-Software-Supply-Chain-Security-Report-RL-compressed.pdf https://ntsc.org/wp-content/uploads/2025/03/The-2025-Software-Supply-Chain-Security-Report-RL-compressed.pdf . DOI: 10.1109/apsec66846.2025.00012 http://dx.doi.org/10.1109/apsec66846.2025.00012
Alfadel M , Costa D E , Shihab E . Empirical analysis of security vulnerabilities in Python packages [J ] . Empirical Software Engineering , 2023 , 28 ( 3 ): 59 . DOI: 10.1007/s10664-022-10278-4 http://dx.doi.org/10.1007/s10664-022-10278-4
Decan A , Mens T , Grosjean P . An empirical comparison of dependency network evolution in seven software packaging ecosystems [J ] . Empirical Software Engineering , 2019 , 24 ( 1 ): 381 - 416 . DOI: 10.1007/s10664-017-9589-y http://dx.doi.org/10.1007/s10664-017-9589-y
Kikas R , Gousios G , Dumas M , et al . Structure and evolution of package dependency networks [C ] // 2017 IEEE/ACM 14th International Conference on Mining Software Repositories . Piscataway : IEEE , 2017 : 102 - 112 . DOI: 10.1109/msr.2017.55 http://dx.doi.org/10.1109/msr.2017.55
Pashchenko I , Plate H , Ponta S E , et al . Vulnerable open source dependencies: Counting those that matter [C ] // Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement . New York : ACM , 2018 : 1 - 10 . DOI: 10.1145/3239235.3268920 http://dx.doi.org/10.1145/3239235.3268920
Zalewski M . American Fuzzy Lop [CP/OL ] . ( 2017-11-24 )[ 2025-07-08 ] . http://lcamtuf.coredump.cx/afl/ http://lcamtuf.coredump.cx/afl/ .
FioraldI A , Maier D , Fioraldi D , et al . AFL++: Combining Incremental Steps of Fuzzing Research [C ] // Proceedings of the 14th USENIX Workshop on Offensive Technologies (WOOT 20) . [S.l.] : USENIX Association , 2020 : 1 - 16 .
许航 , 计江安 , 马哲宇 , 等 . 基于分布散度的自适应模糊测试优化方法 [J ] . 网络与信息安全学报 , 2024 , 10 ( 6 ): 37 - 58 .
Xu Hang , Ji Jiang’an , Ma Zheyu , et al . Self-adaptive fuzzing optimization method based on distribution divergence [J ] . Chinese Journal of Network and Information Security , 2024 , 10 ( 6 ): 37 - 58 . (in Chinese)
肖天 , 江智昊 , 唐鹏 , 等 . 基于深度强化学习的高性能导向性模糊测试方案 [J ] . 网络与信息安全学报 , 2023 , 9 ( 2 ): 132 - 142 .
Xiao Tian , Jiang Zhihao , Tang Peng , et al . High-performance directional fuzzing scheme based on deep reinforcement learning [J ] . Chinese Journal of Network and Information Security , 2023 , 9 ( 2 ): 132 - 142 . (in Chinese)
侍言 , 羌卫中 , 邹德清 , 等 . 进化内核模糊测试研究综述 [J ] . 网络与信息安全学报 , 2024 , 10 ( 1 ): 1 - 21 .
Shi Yan , Qiang Weizhong , Zou Deqin , et al . Survey of evolutionary kernel fuzzing [J ] . Chinese Journal of Network and Information Security , 2024 , 10 ( 1 ): 1 - 21 . (in Chinese)
徐恪 , 冯学伟 , 李琦 , 等 . 安全可信的互联网体系结构与端到端传送关键技术 [J ] . 中兴通讯技术 , 2022 , 28 ( 6 ): 17 - 22 .
Xu Ke , Feng Xuewei , Li Qi , et al . Secure and trusted Internet architecture and key technologies of end-to-end transmission [J ] . ZTE Technology Journal , 2022 , 28 ( 6 ): 17 - 22 . (in Chinese)
GOOGLE . Atheris: A Coverage-Guided, Native Python Fuzzer [CP/OL ] . ( 2020-11-18 )[ 2025-07-08 ] . https://github.com/google/atheris https://github.com/google/atheris .
Li W , Yang H R , Luo X P , et al . PyRTFuzz: Detecting bugs in Python runtimes via two-level collaborative fuzzing [C ] // Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security . New York : ACM , 2023 : 1645 - 1659 . DOI: 10.1145/3576915.3623166 http://dx.doi.org/10.1145/3576915.3623166
Deng Y L , Xia C S , Peng H R , et al . Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models [C ] // Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis . New York : ACM , 2023 : 423 - 435 . DOI: 10.1145/3597926.3598067 http://dx.doi.org/10.1145/3597926.3598067
Deng Y L , Xia C S , Yang C Y , et al . Large language models are edge-case fuzzers: Testing deep learning libraries via FuzzGPT [PP/OL ] . V1.arXiv ( 2023-04-04 )[ 2025-07-10 ] . https://doi.org/10.48550/arXiv.2304.02014 https://doi.org/10.48550/arXiv.2304.02014 .
Yang C Y , Deng Y L , Lu R Y , et al . WhiteFox: White-box compiler fuzzing empowered by large language models [J ] . Proceedings of the ACM on Programming Languages , 2024 , 8 (OOPSLA 2 ): 709 - 735 . DOI: 10.1145/3689736 http://dx.doi.org/10.1145/3689736
Wang J C , Yu L , Luo X P . LLMIF: Augmented large language model for fuzzing IoT devices [C ] // 2024 IEEE Symposium on Security and Privacy . Piscataway : IEEE , 2024 : 881 - 896 . DOI: 10.1109/sp54263.2024.00211 http://dx.doi.org/10.1109/sp54263.2024.00211
Yang L Q , Yang J , Wei C R , et al . FuzzCoder: Byte-level fuzzing test via large language model [PP/OL ] . V1.arXiv ( 2024-09-03 )[ 2025-07-10 ] . https://doi.org/10.48550/arXiv.2409.01944 https://doi.org/10.48550/arXiv.2409.01944 .
Wang D W , Zhou G , Chen L , et al . ProphetFuzz: Fully automated prediction and fuzzing of high-risk option combinations with only documentation via large language model [C ] // Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security . New York : ACM , 2024 : 735 - 749 . DOI: 10.1145/3658644.3690231 http://dx.doi.org/10.1145/3658644.3690231
Zhang C , Zheng Y W , Bai M Q , et al . How effective are they? Exploring large language model based fuzz driver generation [C ] // Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis . New York : ACM , 2024 : 1223 - 1235 . DOI: 10.1145/3650212.3680355 http://dx.doi.org/10.1145/3650212.3680355
许婷 , 肖桐 , 张圣林 , 等 . 基于LLM的日志故障诊断 [J ] . 电子学报 , 2025 , 53 ( 4 ): 1123 - 1141 .
Xu Ting , Xiao Tong , Zhang Shenglin , et al . Log fault diagnosis based on large language models [J ] . Acta Electronica Sinica , 2025 , 53 ( 4 ): 1123 - 1141 . (in Chinese)
张奎元 , 张启亮 , 陈朋朋 , 等 . 面向大规模地下空间的多智能体端边协作全局SLAM方法 [J ] . 电子学报 , 2025 , 53 ( 11 ): 3852 - 3864 .
Zhang Kuiyuan , Zhang Qiliang , Chen Pengpeng , et al . Robots-edge collaborative absolute SLAM in large-scale underground environments [J ] . Acta Electronica Sinica , 2025 , 53 ( 11 ): 3852 - 3864 . (in Chinese)
Lu Mengdi , Ding S , Alaca F , et al . Semantic-aware fuzzing: An empirical framework for LLM-guided, reasoning-driven input mutation [PP/OL ] . V1.arXiv ( 2025-09-30 )[ 2025-07-08 ] . https://arxiv.org/abs/2509.19533 https://arxiv.org/abs/2509.19533 .
Meng R J , Mirchev M , Böhme M , et al . Large language model guided protocol fuzzing [C ] // Proceedings 2024 Network and Distributed System Security Symposium . [S.l.] : Internet Society , 2024 : 1 - 16 . DOI: 10.14722/ndss.2024.24556 http://dx.doi.org/10.14722/ndss.2024.24556 .
Li X , Yuan Z Y , Zhang Z D , et al . Towards large language model guided kernel direct fuzzing [M ] // Fundamental Approaches to Software Engineering . ChamSpringer Nature Switzerland, 2025 : 33 - 42 . DOI: 10.1007/978-3-031-90900-9_2 http://dx.doi.org/10.1007/978-3-031-90900-9_2
Xia C S , Paltenghi M , Jia L T , et al . Fuzz4All: Universal fuzzing with large language models [C ] // Proceedings of the IEEE/ACM 46th International Conference on Software Engineering . New York : ACM , 2024 : 1 - 13 . DOI: 10.1145/3597503.3639121 http://dx.doi.org/10.1145/3597503.3639121
Bazalii M , Fleischer M . Orion: Fuzzing workflow automation [PP/OL ] . V1.arXiv ( 2025-09-29 )[ 2025-07-08 ] . https://arxiv.org/abs/2509.15195 https://arxiv.org/abs/2509.15195 .
Liu N F , Lin K , Hewitt J , et al . Lost in the middle: How language models use long contexts [J ] . Transactions of the Association for Computational Linguistics , 2024 , 12 : 157 - 173 . DOI: 10.1162/tacl_a_00638 http://dx.doi.org/10.1162/tacl_a_00638
Laban P , Fabbri A , Xiong C M , et al . Summary of a haystack: A challenge to long-context LLMs and RAG systems [C ] // Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing . Stroudsburg : ACL , 2024 : 9885 - 9903 . DOI: 10.18653/v1/2024.emnlp-main.552 http://dx.doi.org/10.18653/v1/2024.emnlp-main.552
Kurfali M , Östling R . Conflicting Needles in a Haystack: How LLMs behave when faced with contradictory information [C ] // Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing . Stroudsburg : ACL , 2025 : 34349 - 34364 . DOI: 10.18653/v1/2025.emnlp-main.1742 http://dx.doi.org/10.18653/v1/2025.emnlp-main.1742
Peng X , Jia P , Fan X M , et al . ENZZ: Effective N-gram coverage assisted fuzzing with nearest neighboring branch estimation [J ] . Information and Software Technology , 2025 , 177 : 107582 . DOI: 10.1016/j.infsof.2024.107582 http://dx.doi.org/10.1016/j.infsof.2024.107582
Bouzenia I , Krishan B P , Pradel M . DyPyBench: A benchmark of executable Python software [J ] . Proceedings of the ACM on Software Engineering , 2024 , 1 : 338 - 358 . DOI: 10.1145/3643742 http://dx.doi.org/10.1145/3643742
Xiao J F , Jiang P , Zhao Z X , et al . Robust, efficient, and widely available greybox fuzzing for COTS binaries with system call pattern feedback [C ] // Proceedings of the 34th USENIX Conference on Security Symposium . New York : ACM , 2025 : 6239 - 6258 .
Wang Y H , Jia X K , Liu Y W , et al . Not all coverage measurements are equal: Fuzzing by coverage accounting for input prioritization [C ] // Proceedings 2020 Network and Distributed System Security Symposium . [S.l.] : Internet Society , 2020 : 1 - 15 . DOI: 10.14722/ndss.2020.24422 http://dx.doi.org/10.14722/ndss.2020.24422 .
Zügner D , Akbarnejad A , Günnemann S . Adversarial attacks on neural networks for graph data [C ] // Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . New York : ACM , 2018 : 2847 - 2856 . DOI: 10.1145/3219819.3220078 http://dx.doi.org/10.1145/3219819.3220078
Paramitha R , Massacci F . Technical leverage analysis in the Python ecosystem [J ] . Empirical Software Engineering , 2023 , 28 ( 6 ): 139 . DOI: 10.1007/s10664-023-10355-2 http://dx.doi.org/10.1007/s10664-023-10355-2
0
Views
22
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621