

浏览全部资源
扫码关注微信
1.高可信软件技术教育部重点实验室(北京大学),北京 100871
2.北京大学计算机学院,北京 100871
Received:19 October 2023,
Revised:2024-02-09,
Published:25 April 2024
移动端阅览
张昕,王冠成,吴宜谦,等. 贝叶斯程序分析[J]. 电子学报,2024,52(04):1155-1172.
ZHANG Xin, WANG Guan-cheng, WU Yi-qian, et al. Bayesian Program Analysis[J]. Acta Electronica Sinica, 2024, 52(04): 1155-1172.
张昕,王冠成,吴宜谦,等. 贝叶斯程序分析[J]. 电子学报,2024,52(04):1155-1172. DOI:10.12263/DZXB.20230973
ZHANG Xin, WANG Guan-cheng, WU Yi-qian, et al. Bayesian Program Analysis[J]. Acta Electronica Sinica, 2024, 52(04): 1155-1172. DOI:10.12263/DZXB.20230973
程序分析在软件开发和维护中发挥着关键作用.然而,传统基于逻辑的程序分析方法在处理现代复杂、大规模和动态特性丰富的软件系统时往往效果有限,其根源在于软件系统中的不确定性.研究人员针对具体的程序分析问题提出了一系列新的技术,其特征是在传统逻辑分析的基础上结合概率信息来捕获软件系统中的不确定性.通过总结和抽象这些已有工作,本文提出了贝叶斯程序分析框架,其核心思想是结合程序分析和贝叶斯统计推断,通过建模和更新关于程序的概率分布来推断有关程序行为的信息.贝叶斯程序分析采用概率逻辑编程来同时处理概率信息和逻辑信息,用统一的方式捕获了现有的多项不同工作,也能泛化到程序缺陷定位和差异调试等非传统程序静态分析任务上.本文给出了贝叶斯程序分析框架的定义,展示了该框架在程序分析和相关领域的应用,并展望了未来发展方向.
Program analysis plays a critical role in software development and maintenance. However
traditional logic-based program analysis methods exhibit significant limitations when dealing with modern
complex
large-scale
and dynamically rich software systems. The root cause of these limitations lies in the uncertainty present in software systems. To address this issue
researchers have proposed a series of new techniques for specific program analysis problems. These techniques combine probability information with traditional logic analysis to capture the uncertainty inherent in software systems. By summarizing and abstracting existing work in this area
this paper introduces the Bayesian program analysis framework. The core idea of this framework is to integrate program analysis with Bayesian statistical inference. It does so by modeling and updating probability distributions about the program to infer information about program behavior. Bayesian program analysis employs probabilistic logic programming to simultaneously handle both probability and logic information
providing a unified approach that encompasses various existing works. It can also be generalized to non-traditional static program analysis tasks
such as program fault localization and delta debugging. This paper provides a definition of the Bayesian program analysis framework
demonstrates its applications in program analysis and related fields
and outlines future directions for development.
JOHNSON-FREYD P A . Introduction to Static Analysis [R ] . Livermore : Sandia National Lab . (SNL-CA) , 2019 .
张健 , 张超 , 玄跻峰 , 等 . 程序分析研究进展 [J ] . 软件学报 , 2019 , 30 ( 1 ): 80 - 109 .
ZHANG J , ZHANG C , XUAN J F , et al . Recent progress in program analysis [J ] . Journal of Software , 2019 , 30 ( 1 ): 80 - 109 . (in Chinese)
陆申明 , 左志强 , 王林章 . 静态程序分析并行化研究进展 [J ] . 软件学报 , 2020 , 31 ( 5 ): 1243 - 1254 .
LU S M , ZUO Z Q , WANG L Z . Progress in parallelization of static program analysis [J ] . Journal of Software , 2020 , 31 ( 5 ): 1243 - 1254 . (in Chinese)
戚晓芳 , 徐宝文 , 周晓宇 . 一种基于程序可达图的并发程序依赖性分析方法 [J ] . 电子学报 , 2007 , 35 ( 2 ): 287 - 291 .
QI X F , XU B W , ZHOU X Y . An approach to analyzing dependence of concurrent programs based on program reachability graphs [J ] . Acta Electronica Sinica , 2007 , 35 ( 2 ): 287 - 291 . (in Chinese)
YAO P S , SHI Q K , HUANG H Q , et al . Program analysis via efficient symbolic abstraction [J ] . Proceedings of the ACM on Programming Languages , 5 (OOPSLA): 118.
LATTNER C , ADVE V . LLVM: A compilation framework for lifelong program analysis & transformation [C ] // International Symposium on Code Generation and Optimization . Piscataway : IEEE , 2004 : 75 - 86 .
张大林 , 金大海 , 宫云战 , 等 . 基于缺陷关联的静态分析优化 [J ] . 软件学报 , 2014 , 25 ( 2 ): 386 - 399 .
ZHANG D L , JIN D H , GONG Y Z , et al . Optimizing static analysis based on defect correlations [J ] . Journal of Software , 2014 , 25 ( 2 ): 386 - 399 . (in Chinese)
MA W W , CHEN L , ZHANG X Y , et al . How do developers fix cross-project correlated bugs? A case study on the GitHub scientific python ecosystem [C ] // Proceedings of the 39th International Conference on Software Engineering . Piscataway : IEEE , 2017 : 381 - 392 .
REPS T , HORWITZ S , SAGIV M . Precise interprocedural dataflow analysis via graph reachability [C ] // Proceedings of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages . New York : ACM , 1995 : 49 - 61 .
COUSOT P , COUSOT R . Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints [C ] // Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of Programming Languages . New York : ACM , 1977 : 238 - 252 .
KAM J B , ULLMAN J D . Monotone data flow analysis frameworks [J ] . Acta Informatica , 1977 , 7 ( 3 ): 305 - 317 .
CLARKE E M , HENZINGER T A , VEITH H . Introduction to model checking [M ] // Handbook of Model Checking . Cham : Springer , 2018 : 1 - 26 .
SCHMIDT D A . Data flow analysis is model checking of abstract interpretations [C ] // Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages . New York : ACM , 1998 : 38 - 48 .
HUANG S S , GREEN T J , LOO B T . Datalog and emerging applications: An interactive tutorial [C ] // Proceedings of the 2011 ACM SIGMOD International Conference on Management of data . New York : ACM , 2011 : 1213 - 1216 .
SHI Q K , YAO P S , WU R X , et al . Path-sensitive sparse analysis without path conditions [C ] // Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation . New York : ACM , 2021 : 930 - 943 .
MANGAL R , ZHANG X , NORI A V , et al . A user-guided approach to program analysis [C ] // Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering . New York : ACM , 2015 : 462 - 473 .
RAGHOTHAMAN M , KULKARNI S , HEO K , et al . User-guided program reasoning using Bayesian inference [C ] // Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation . New York : ACM , 2018 : 722 - 735 .
CHEN T Y , HEO K , RAGHOTHAMAN M . Boosting static analysis accuracy with instrumented test executions [C ] // Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering . New York : ACM , 2021 : 1154 - 1165 .
ZENG M H , WU Y Q , YE Z T , et al . Fault localization via efficient probabilistic modeling of program semantics [C ] // 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) . Piscataway : IEEE , 2022 : 958 - 969 .
WANG G C , SHEN R B , CHEN J J , et al . Probabilistic delta debugging [C ] // Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering . New York : ACM , 2021 : 881 - 892 .
ZHANG X , SI X J , NAIK M . Combining the logical and the probabilistic in program analysis [C ] // Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages . New York : ACM , 2017 : 27 - 34 .
ZHANG X , MANGAL R , GRIGORE R , et al . On abstraction refinement for program analyses in datalog [C ] // Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation . New York : ACM , 2014 : 239 - 248 .
HEO K , RAGHOTHAMAN M , SI X J , et al . Continuously reasoning about programs using differential Bayesian inference [C ] // Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation . New York : ACM , 2019 : 561 - 575 .
KIM H , RAGHOTHAMAN M , HEO K . Learning probabilistic models for static analysis alarms [C ] // 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) . Piscataway : IEEE , 2022 : 1282 - 1293 .
ZHANG X , GRIGORE R , SI X , ET AL . Effective interactive resolution of static analysis alarms [J ] . Proceedings of the ACM on Programming Languages , 2017 , 1(OOPSLA): 1- 30 .
SEN K . Race directed random testing of concurrent programs [C ] // Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation . New York : ACM , 2008 : 11 - 21 .
BLACKBURN S M , GARNER R , HOFFMANN C , et al . The DaCapo benchmarks: Java benchmarking development and analysis [C ] // Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications . New York : ACM , 2006 : 169 - 190 .
FENG Y , ANAND S , DILLIG I , et al . Apposcopy: Semantics-based detection of Android malware through static analysis [C ] // Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering . New York : ACM , 2014 : 576 - 587 .
NETHERCOTE N , SEWARD J . Valgrind: A framework for heavyweight dynamic binary instrumentation [J ] . ACM SIGPLAN Notices , 2007 , 42 ( 6 ): 89 - 100 .
FLANAGAN C , FREUND S N . The RoadRunner dynamic analysis framework for concurrent programs [C ] // Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering . New York : ACM , 2010 : 1 - 8 .
CHANG W , STREIFF B , LIN C . Efficient and extensible security enforcement using dynamic data flow analysis [C ] // Proceedings of the 15th ACM Conference on Computer and Communications Security . New York : ACM , 2008 : 39 - 50 .
BODÍK R , GUPTA R , SARKAR V . ABCD: Eliminating array bounds checks on demand [C ] // Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation . New York : ACM , 2000 : 321 - 333 .
SEN K , MARINOV D , AGHA G . CUTE [J ] . ACM SIGSOFT Software Engineering Notes , 2005 , 30 ( 5 ): 263 - 272 .
CSALLNER C , SMARAGDAKIS Y . Check ‘n’ crash: Combining static checking and testing [C ] // Proceedings of 27th International Conference on Software Engineering . Piscataway : IEEE , 2005 : 422 - 431 .
GUPTA A , MAJUMDAR R , RYBALCHENKO A . From tests to proofs [C ] // International Conference on Tools and Algorithms for the Construction and Analysis of Systems . Berlin : Springer , 2009 : 262 - 276 .
NAIK M , YANG H , CASTELNUOVO G , et al . Abstractions from tests [C ] // Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages . New York : ACM , 2012 : 373 - 386 .
NORI A V , RAJAMANI S K , TETALI S , et al . The yogi project: Software property checking via static analysis and testing [C ] // International Conference on Tools and Algorithms for the Construction and Analysis of Systems . Berlin : Springer , 2009 : 178 - 181 .
CERI S , GOTTLOB G , TANCA L . What you always wanted to know about datalog (and never dared to ask) [J ] . IEEE Transactions on Knowledge and Data Engineering , 1989 , 1 ( 1 ): 146 - 166 .
KWIATKOWSKA M , NORMAN G , PARKER D . PRISM: Probabilistic symbolic model checker [M ] // Computer Performance Evaluation: Modelling Techniques and Tools . Berlin : Springer , 2002 : 200 - 204 .
DE RAEDT L , KIMMIG A , TOIVONEN H . ProbLog: A probabilistic prolog and its application in link discovery [C ] // Proceedings of the 20th International Joint Conference on Artifical Intelligence . New York : ACM , 2007 : 2468 - 2473 .
KOLLER D , FRIEDMAN N . Probabilistic Graphical Models: Principles and Techniques [M ] . Cambridge : MIT Press , 2009 .
CHAVIRA M , DARWICHE A . On probabilistic inference by weighted model counting [J ] . Artificial Intelligence , 2008 , 172 ( 6/7 ): 772 - 799 .
CHAVIRA M , DARWICHE A . Compiling Bayesian networks using variable elimination [C ] // Proceedings of the 20th International Joint Conference on Artifical Intelligence . New York : ACM , 2007 : 2443 - 2449 .
YEDIDIA J S , FREEMAN W T , WEISS Y . Generalized belief propagation [C ] // Proceedings of the 13th International Conference on Neural Information Processing Systems . New York : ACM , 2000 : 668 - 674 .
MURPHY K , WEISS Y , JORDAN M I . Loopy belief propagation for approximate inference: An empirical study [EB/OL ] . ( 2013-01-23 )[ 2023-09-30 ] . https://arxiv.org/abs/1301.6725 https://arxiv.org/abs/1301.6725 .
ANDRIEU C , DE FREITAS N , DOUCET A , et al . An introduction to MCMC for machine learning [J ] . Machine Learning , 2003 , 50 ( 1 ): 5 - 43 .
ANDRIEU C , THOMS J . A tutorial on adaptive MCMC [J ] . Statistics and Computing , 2008 , 18 ( 4 ): 343 - 373 .
MEENT J W V D , PAIGE B , YANG H , et al . An introduction to probabilistic programming [EB/OL ] . ( 2018-09-27 )[ 2023-09-30 ] . https://arxiv.org/abs/1809.10756 https://arxiv.org/abs/1809.10756 .
BLUNDELL C , CORNEBISE J , KAVUKCUOGLU K , et al . Weight uncertainty in neural networks [C ] // Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 . New York : ACM , 2015 : 1613 - 1622 .
MANGAL R , ZHANG X , NORI A V , et al . Volt: A lazy grounding framework for solving very large MaxSAT instances [C ] // International Conference on Theory and Applications of Satisfiability Testing . Cham : Springer , 2015 : 299 - 306 .
SUN Q , ZHAO J J , CHEN Y T . Probabilistic points-to analysis for Java [C ] // Proceedings of the 20th International Conference on Compiler construction: Part of the Joint European Conferences on Theory and Practice of Software . New York : ACM , 2011 : 62 - 81 .
DARWICHE A . Modeling and Reasoning with Bayesian Networks [M ] . Cambridge : Cambridge University Press , 2009 .
NG S K , KRISHNAN T , MCLACHLAN G J . The EM algorithm [M ] // Handbook of Computational Statistics . Berlin : Springer , 2011 : 139 - 172 .
FOX C W , ROBERTS S J . A tutorial on variational Bayesian inference [J ] . Artificial Intelligence Review , 2012 , 38 ( 2 ): 85 - 95 .
KROESE D P , RUBINSTEIN R Y . Monte Carlo methods [J ] . WIREs Computational Statistics , 2012 , 4 ( 1 ): 48 - 58 .
CHOW C , LIU C . Approximating discrete probability distributions with dependence trees [J ] . IEEE Transactions on Information Theory , 1968 , 14 ( 3 ): 462 - 467 .
SHAH A , SHAH D , WORNELL G . On learning continuous pairwise Markov random fields [C ] // Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) . SanDiego : PMLR , 2021 : 1153 - 1161 .
MENDELSON J , NAIK A , RAGHOTHAMAN M , et al . GENSYNTH: Synthesizing datalog programs without language bias [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2021 , 35 ( 7 ): 6444 - 6453 .
SI X J , LEE W , ZHANG R , et al . Syntax-guided synthesis of datalog programs [C ] // Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering . New York : ACM , 2018 : 515 - 527 .
RAGHOTHAMAN M , MENDELSON J , ZHAO D , et al . Provenance-guided synthesis of datalog programs [J ] . Proceedings of the ACM on Programming Languages , 2019 , 4(POPL): 62.
ALBARGHOUTHI A , KOUTRIS P , NAIK M , et al . Constraint-based synthesis of datalog programs [M ] // Lecture Notes in Computer Science . Cham : Springer , 2017 : 689 - 706 .
WONG W E , DEBROY V , GAO R Z , et al . The DStar method for effective software fault localization [J ] . IEEE Transactions on Reliability , 2014 , 63 ( 1 ): 290 - 308 .
PAPADAKIS M , LE TRAON Y . Metallaxis-FL: Mutation-based fault localization [J ] . Software Testing , Verification & Reliability, 2015 , 25 ( 5/6/7 ): 605 - 628 .
JIANG J J , WANG R , XIONG Y F , et al . Combining spectrum-based fault localization and statistical debugging: An empirical study [C ] // 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) . Piscataway : IEEE , 2019 : 502 - 514 .
ABREU R , ZOETEWEIJ P , VAN GEMUND A J C . On the accuracy of spectrum-based fault localization [C ] // Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION . Piscataway : IEEE , 2007 : 89 - 98 .
LIU Y , LI Z , WANG L , et al . Statement-oriented mutant reduction strategy for mutation based fault localization [C ] // 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS) . Piscataway : IEEE , 2017 : 126 - 137 .
LIU Y , LI Z , WANG L X , et al . Statement-oriented mutant reduction strategy for mutation based fault localization [C ] // 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS) . Piscataway : IEEE , 2017 : 126 - 137 .
JONES J A , HARROLD M J , STASKO J . Visualization of test information to assist fault localization [C ] // Proceedings of the 24th International Conference on Software Engineering . Piscataway : IEEE , 2002 : 467 - 477 .
MOON S , KIM Y , KIM M , et al . Ask the mutants: Mutating faulty programs for fault localization [C ] // 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation . Piscataway : IEEE , 2014 : 153 - 162 .
ZELLER A , HILDEBRANDT R . Simplifying and isolating failure-inducing input [J ] . IEEE Transactions on Software Engineering , 2002 , 28 ( 2 ): 183 - 200 .
COUSOT P . Abstract interpretation [J ] . ACM Computing Surveys , 1996 , 28 ( 2 ): 324 - 328 .
AIKEN A . Introduction to set constraint-based program analysis [J ] . Science of Computer Programming , 1999 , 35 ( 2/3 ): 79 - 111 .
BRAVENBOER M , SMARAGDAKIS Y . Strictly declarative specification of sophisticated points-to analyses [C ] // Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications . New York : ACM , 2009 : 243 - 262 .
GUYER S Z , LIN C . Client-driven pointer analysis [M ] // Static Analysis . Berlin : Springer , 2003 : 214 - 236 .
CLARKE E , GRUMBERG O , JHA S , et al . Counterexample-guided abstraction refinement [M ] // Computer Aided Verification . Berlin : Springer , 2000 : 154 - 169 .
COUSOT P , MONERAU M . Probabilistic abstract interpretation [C ] // European Symposium on Programming . Berlin : Springer , 2012 : 169 - 193 .
RICHARDSON M , DOMINGOS P . Markov logic networks [J ] . Machine Learning , 2006 , 62 ( 1 ): 107 - 136 .
BECKMAN N E , NORI A V . Probabilistic, modular and scalable inference of typestate specifications [C ] // Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation . New York : ACM , 2011 : 211 - 221 .
KREMENEK T , TWOHEY P , BACK G , et al . From uncertainty to belief: Inferring the specification within [C ] // Proceedings of the 7th Symposium on Operating Systems Design and Implementation . New York : ACM , 2006 : 161 - 176 .
LIVSHITS B , NORI A V , RAJAMANI S K , et al . Merlin: Specification inference for explicit information flow problems [J ] . ACM Sigplan Notices , 2009 , 44 ( 6 ): 75 - 86 .
RUSSELL S . Unifying logic and probability [J ] . Communications of the ACM , 2015 , 58 ( 7 ): 88 - 97 .
GOODMAN N D . The principles and practice of probabilistic programming [J ] . ACM SIGPLAN Notices , 2013 , 48 ( 1 ): 399 - 402 .
POOLE D . Probabilistic Horn abduction and Bayesian networks [J ] . Artificial Intelligence , 1993 , 64 ( 1 ): 81 - 129 .
MUGGLETON S . Stochastic logic programs [EB/OL ] . [ 2023-09-29 ] . https://www.semanticscholar.org/paper/Stochastic-Logic-Programs-Muggleton/454742f723d5bdeb023d4a65ede020e881d68ae4 https://www.semanticscholar.org/paper/Stochastic-Logic-Programs-Muggleton/454742f723d5bdeb023d4a65ede020e881d68ae4 .
POOLE D . The independent choice logic and beyond [M ] // Probabilistic Inductive Logic Programming: Theory and Applications . Berlin : Springer , 2008 : 222 - 243 .
KERSTING K , DE RAEDT L . Basic principles of learning Bayesian logic programs [M ] // Probabilistic Inductive Logic Programming . Berlin : Springer , 2008 : 189 - 221 .
KIMMIG A , BACH S , BROECHELER M , et al . A short introduction to probabilistic soft logic [C ] // Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications . Funner : NeurIPS , 2012 : 1 - 4 .
ZHANG D , DANG Y , LOU J G , et al . Software analytics as a learning case in practice: Approaches and experiences [C ] // Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering . Lawrence : ACM , 2011 : 55 - 58 .
VIDONI M . A systematic process for mining software repositories: Results from a systematic literature review [J ] . Information and Software Technology , 2022 , 144 : 106791 .
0
Views
40
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621