1.粮食信息处理与控制教育部重点实验室(河南工业大学),河南郑州 450001
2.河南工业大学信息科学与工程学院,河南郑州 450001
3.河南工业大学河南省粮食信息处理国际联合实验室,河南郑州 450001
[ "曹鹤玲 女,1980年5月出生于河南南阳.河南工业大学信息科学与工程学院副教授,硕士生导师,CCF高级会员.主要研究领域为软件分析与测试、深度学习技术.E-mail: caohl@haut.edu.cn" ]
[ "刘 昱 男,1997年3月出生于河南郑州.河南工业大学信息科学与工程学院硕士研究生,CCF会员.主要研究领域为软件分析与测试.E-mail: zzhautly@163.com" ]
[ "韩 栋 男,1997年1月出生于河南驻马店.河南工业大学信息科学与工程学院硕士研究生.主要研究领域为软件分析与测试.E-mail: handong531@163.com" ]
收稿:2022-06-27,
修回:2022-11-19,
纸质出版:2024-03-25
移动端阅览
曹鹤玲,刘昱,韩栋.基于自注意力机制神经机器翻译的软件缺陷自动修复方法[J].电子学报,2024,52(03):945-956.
CAO He-ling, LIU Yu, HAN Dong.Self-Attention Neural Machine Translation for Automatic Software Repair[J].Acta Electronica Sinica, 2024, 52(03): 945-956.
曹鹤玲,刘昱,韩栋.基于自注意力机制神经机器翻译的软件缺陷自动修复方法[J].电子学报,2024,52(03):945-956. DOI:10.12263/DZXB.20220734
CAO He-ling, LIU Yu, HAN Dong.Self-Attention Neural Machine Translation for Automatic Software Repair[J].Acta Electronica Sinica, 2024, 52(03): 945-956. DOI:10.12263/DZXB.20220734
循环神经网络对于代码序列数据有着良好的处理能力,软件缺陷修复的补丁生成模型大多采用循环神经网络实现. 然而,基于循环神经网络的补丁生成模型在处理代码序列中长距离依赖问题时仍然具有局限性,其修复成功率和修复效率较低. 针对此问题,提出一种基于自注意力神经机器翻译的软件缺陷自动修复方法(Self-attention Neural machine translation based automatic software Repair,SNRepair). 首先,为有效缓解源码中的未登录词问题,对数据集引入子词切分技术进行预处理;其次,为解决源代码中棘手的长距离依赖问题并更充分地利用局部信息,构建融合局部建模的Transformer程序补丁生成模型;然后,采用缺陷自动定位技术定位缺陷语句位置,利用参数优化后的Transformer补丁生成模型生成候选补丁;最后,运行测试用例验证候选补丁. 在具有395个真实Java软件缺陷的Defects4J缺陷库上实验评估,结果表明SNRepair方法与对比方法比较,修复成功率和修复效率更高.
Recurrent neural network has good ability of the processing for code sequences
and the patch generation model is mostly implemented by it. However
recurrent neural network-based patch generation models still have some limitations when dealing with long-distance dependencies in code sequences
and their repair success rate and repair efficiency is low. To address the issue
we present SNRepair
an automatic software fault repair based on self-attention neural machine translation. First
the subword tokenization technology is introduced to preprocess the dataset to alleviate the problem of out of vocabulary. Second
a Transformer program patch generation model that integrates local modeling is constructed to alleviate the long-distance dependencies in the source code and make better use of local information. Third
the automatic fault localization technology is used to locate the possible fault position and the Transformer patch generation model through parameter optimization is adopted to generate candidate patches. Finally
the candidate patches are verified by test cases. On the 395 real Java software faults in the Defects4J
the results show that the SNRepair has higher repair success rate and repair efficiency than the compared ones.
李斌 , 贺也平 , 马恒太 . 程序自动修复: 关键问题及技术 [J ] . 软件学报 , 2019 , 30 ( 2 ): 244 - 265 .
LI B , HE Y P , MA H T . Automatic program repair: Key problems and technologies [J ] . Journal of Software , 2019 , 30 ( 2 ): 244 - 265 . (in Chinese)
XIN Q , REISS S P . Leveraging syntax-related code for automated program repair [C ] // 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) . Piscataway : IEEE , 2017 : 660 - 670 .
XUAN J F , MARTINEZ M , DEMARCO F , et al . Nopol: Automatic repair of conditional statement bugs in Java programs [J ] . IEEE Transactions on Software Engineering , 2017 , 43 ( 1 ): 34 - 55 .
HUA J R , ZHANG M S , WANG K Y , et al . Towards practical program repair with on-demand candidate generation [C ] // Proceedings of the 40th International Conference on Software Engineering . New York : ACM , 2018 : 12 - 23 .
GUPTA R , PAL S , KANADE A , et al . DeepFix: Fixing common C language errors by deep learning [C ] // Proceedings of the 31st AAAI Conference on Artificial Intelligence . San Francisco : AAAI Press , 2017 : 1345 - 1351 .
CHEN Z M , KOMMRUSCH S J , TUFANO M , et al . Sequencer: Sequence-to-sequence learning for end-to-end program repair [J ] . IEEE Transactions on Software Engineering , 2021 , 47 ( 9 ): 1943 - 1959 .
LI Y , WANG S H , NGUYEN T N . DLFix: Context-based code transformation learning for automated program repair [C ] // Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering . New York : ACM , 2020 : 602 - 614 .
WEN M , CHEN J J , WU R X , et al . Context-aware patch generation for better automated program repair [C ] // Proceedings of the 40th International Conference on Software Engineering . New York : ACM , 2018 : 1 - 11 .
VILLANUEVA O M , TRUJILLO L , HERNANDEZ D E . Novelty search for automatic bug repair [C ] // Proceedings of the 2020 Genetic and Evolutionary Computation Conference . New York : ACM , 2020 : 1021 - 1028 .
AFZAL A , MOTWANI M , STOLEE K T , et al . SOSRepair: Expressive semantic search for real-world program repair [J ] . IEEE Transactions on Software Engineering , 2021 , 47 ( 10 ): 2162 - 2181 .
GAO X , WANG B , DUCK G J , et al . Beyond tests: Program vulnerability repair via crash constraint extraction [J ] . ACM Transactions on Software Engineering and Methodology , 2021 , 30 ( 2 ): 1 - 27 .
TIAN Y C , RAY B . Automatically diagnosing and repairing error handling bugs in C [C ] // Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering . New York : ACM , 2017 : 752 - 762 .
KOYUNCU A , LIU K , BISSYANDÉ T F , et al . FixMiner: Mining relevant fix patterns for automated program repair [J ] . Empirical Software Engineering , 2020 , 25 ( 3 ): 1980 - 2024 .
CHAKRABORTY S , DING Y , ALLAMANIS M , et al . CODIT: Code editing with tree-based neural models [J ] . IEEE Transactions on Software Engineering , 2022 , 48 ( 4 ): 1385 - 1399 .
TANG Y , ZHOU L , BLANCO A , et al . Grammar-based patches generation for automated program repair [C ] // Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 . Stroudsburg : Association for Computational Linguistics , 2021 : 1300 - 1305 .
TUFANO M , WATSON C , BAVOTA G , et al . An empirical study on learning bug-fixing patches in the wild via neural machine translation [J ] . ACM Transactions on Software Engineering and Methodology , 2019 , 28 ( 4 ): 1 - 29 .
LUTELLIER T , PHAM H V , PANG L , et al . CoCoNuT: Combining context-aware neural translation models using ensemble for program repair [C ] // Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis . New York : ACM , 2020 : 101 - 114 .
BROWN T B , MANN B , RYDER N , et al . Language models are few-shot learners [C ] // Proceedings of the 34th International Conference on Neural Information Processing Systems . New York : ACM , 2020 : 1877 - 1901 .
DEVLIN J , CHANG M , LEE K , et al . BERT: Pre-training of deep bidirectional transformers for language understanding [EB/OL ] .( 2018-10-11 )[ 2022-05-01 ] . https://arxiv. org /abs/1810.04805 https://arxiv.org/abs/1810.04805 .
RIBOIRA A , ABREU R . The GZoltar project: A graphical debugger interface [C ] // Proceedings of the International Academic and Industrial Conference on Practice and Research Techniques . Berlin : Springer-Verlag , 2010 : 215 - 218 .
LIN Z H , FENG M W , SANTOS C N , et al . A structured self-attentive sentence embedding [EB/OL ] . ( 2017-05-29 ) [ 2022-05-01 ] . https://arxiv.org/abs/1703.03130 https://arxiv.org/abs/1703.03130 .
KUDO T . Subword regularization: improving neural network translation models with multiple subword candidates [C ] // Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics . Stroudsburg : Association for Computational Linguistics , 2018 : 66 - 75 .
KUDO T , RICHARDSON J . SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing [C ] // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations . Stroudsburg : Association for Computational Linguistics , 2018 : 66 - 71 .
LUONG T , PHAM H , MANNING C D . Effective approaches to attention-based neural machine translation [C ] // Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . Stroudsburg : Association for Computational Linguistics , 2015 : 1412 - 1421 .
YANG B S , TU Z P , WONG D F , et al . Modeling localness for self-attention networks [C ] // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing . Stroudsburg : Association for Computational Linguistics , 2018 : 4449 - 4458 .
CHEN Z , MONPERRUS M . The CodRep machine learning on source code competition [EB/OL ] . ( 2018-07-05 )[ 2022-05-01 ] . https://arxiv.org/abs/1807.03200 https://arxiv.org/abs/1807.03200 .
JUST R , JALALI D , ERNST M D . Defects4J: A database of existing faults to enable controlled testing studies for Java programs [C ] // Proceedings of the 2014 International Symposium on Software Testing and Analysis . New York : ACM , 2014 : 437 - 440 .
LE X B D , LO D , LE GOUES C . History driven program repair [C ] // 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER) . Piscataway : IEEE , 2016 : 213 - 224 .
XIONG Y F , WANG J , YAN R F , et al . Precise condition synthesis for program repair [C ] // 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE) . Piscataway : IEEE , 2017 : 416 - 426 .
LIU K , KOYUNCU A , KIM K , et al . LSRepair: Live search of fix ingredients for automated program repair [C ] // 2018 25th Asia-Pacific Software Engineering Conference (APSEC) . Piscataway : IEEE , 2018 : 658 - 662 .
MARTINEZ M , DURIEUX T , SOMMERARD R , et al . Automatic repair of real bugs in Java: A large-scale experiment on the defects4j dataset [J ] . Empirical Software Engineering , 2017 , 22 ( 4 ): 1936 - 1964 .
SOBREIRA V , DURIEUX T , MADEIRAL F , et al . Dissection of a bug dataset: Anatomy of 395 patches from Defects4J [C ] // 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER) . Piscataway : IEEE , 2018 : 130 - 140 .
QI Y H , MAO X G , LEI Y , et al . Using automated program repair for evaluating the effectiveness of fault localization techniques [C ] // Proceedings of the 2013 International Symposium on Software Testing and Analysis . New York : ACM , 2013 : 191 - 201 .
CHEN L S , PEI Y , FURIA C A . Contract-based program repair without the contracts [C ] // 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) . Piscataway : IEEE , 2017 : 637 - 647 .
GHANBARI A , BENTON S , ZHANG L M . Practical program repair via bytecode mutation [C ] // Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis . New York : ACM , 2019 : 19 - 30 .
LUTELLIER T , PANG L , PHAM V H , et al . ENCORE: Ensemble learning using convolution neural machine translation for automatic program repair [EB/OL ] . ( 2019-06-20 )[ 2022-05-01 ] . https://arxiv.org/abs/1906.08691 https://arxiv.org/abs/1906.08691 .
0
浏览量
14
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621