融合句嵌入模型和代码特征的补丁验证方法

蒋婷婷; 姜淑娟; 韩威

doi:10.12263/DZXB.20230048

您当前的位置：

首页 >

文章列表页 >

融合句嵌入模型和代码特征的补丁验证方法

学术论文 | 更新时间：2025-12-11

- 融合句嵌入模型和代码特征的补丁验证方法
- Patch Verification Method Integrating Sentence Embedding Model and Code Features
- 电子学报 2023年51卷第12期页码：3450-3456
- 作者机构：
  
  中国矿业大学计算机科学与技术学院，江苏徐州 221116
- 作者简介：
  
  [ "蒋婷婷女，1997年10月出生于安徽淮北.目前就读于中国矿业大学.主要研究方向为软件分析与测试、程序缺陷自动修复. E-mail: TS20170072P31@cumt.edu.cn" ]
  [ "姜淑娟（通讯作者）女，1966年12月出生于山东莱阳.现为中国矿业大学教授、博士生导师.主要研究方向为软件分析与测试、缺陷预测、故障定位、进化计算.E-mail: shjjiang@cumt.edu.cn" ]
- 基金信息：
- DOI：10.12263/DZXB.20230048
  中图分类号： TP311
- 收稿：2023-01-13，
  
  修回：2023-10-07，
  
  纸质出版：2023-12-25
- 稿件说明：
移动端阅览
蒋婷婷,姜淑娟,韩威.融合句嵌入模型和代码特征的补丁验证方法[J].电子学报,2023,51(12):3450-3456.

JIANG Ting-ting,JIANG Shu-juan,HAN Wei.Patch Verification Method Integrating Sentence Embedding Model and Code Features[J].ACTA ELECTRONICA SINICA,2023,51(12):3450-3456.
蒋婷婷,姜淑娟,韩威.融合句嵌入模型和代码特征的补丁验证方法[J].电子学报,2023,51(12):3450-3456. DOI： 10.12263/DZXB.20230048.

JIANG Ting-ting,JIANG Shu-juan,HAN Wei.Patch Verification Method Integrating Sentence Embedding Model and Code Features[J].ACTA ELECTRONICA SINICA,2023,51(12):3450-3456. DOI： 10.12263/DZXB.20230048.

摘要

补丁验证常用运行测试套件的方法来验证补丁正确性，然而自动修复技术生成的补丁往往数量巨大，而将每个补丁依次通过测试套件则会产生难以承受的开销.针对该问题，本文提出一个由句嵌入模型InferSent和支持向量机分类器组成的静态补丁验证方法.使用InferSent提取代码静态特征并通过支持向量机分类器来预测补丁正确性.该方法更加关注代码的静态特征信息，通过对特征的提取分析，无需运行测试套件即可有效地预测自动修复工具生成的补丁的正确性.本文在多个自动修复工具生成的补丁集合上进行了验证.实验结果表明，在修复工具生成的补丁集合上，本文提出的静态补丁验证方法对补丁预测的F1值达到71.89%，相比其他两种最新静态补丁验证方法分别提高11.64%和6.43%，并在五项评价指标上均优于对比模型.表明该方法可以在不运行测试套件的情况下正确预测补丁，且具有良好的泛化能力.

Abstract

Patch verification often runs a test suite to verify the correctness of patches

however

the number of patches generated by automatic repair techniques is often huge

and passing each patch through the test suite in turn incurs an unbearable overhead. To address this problem

this paper proposes a static patch verification method consisting of a sentence embedding model InferSent and an support vector machine (SVM) classifier. InferSent is used to extract static features of the code and the SVM classifier is used to predict the patch correctness. The method focuses more on the static feature information of the code

and by extracting and analyzing the features

it can effectively predict the correctness of the patches generated by automatic repair tools without running a test suite. In this paper

it is validated on a collection of patches generated by several automatic repair tools. The experimental results show that the static patch validation method proposed in this paper achieves an F1 value of 71.89% for patch prediction on the patch sets generated by the repair tool

which is 11.64% and 6.43% higher than the other two state-of-the-art static patch validation methods

respectively

and outperforms the comparison models in terms of all five evaluation metrics. It is shown that the method can correctly predict patches without running the test suite and has good generalization capability.

关键词

Keywords

references

GAZZOLA L , MICUCCI D , MARIANI L . Automatic software repair: A survey [J ] . IEEE Transactions on Software Engineering , 2017 , 45 ( 1 ): 34 - 67 .

LE X B D , BAO L , LO D , et al . On reliability of patch correctness assessment [C ] // 2019 IEEE/ACM 41st International Conference on Software Engineering . Montreal : IEEE , 2019 : 524 - 535 .

WANG S , WEN M , LIN B , et al . Automated patch correctness assessment: How far are we? [C ] // 35th IEEE/ACM International Conference on Automated Software Engineering . Melbourne : IEEE , 2020 : 968 - 980 .

HINDLE A , BARR E T , SU Z , et al . On the naturalness of software [C ] // 34th International Conference on Software Engineering (ICSE) . Zurich : IEEE , 2012 : 837 - 847 .

CSUVIK V , HORVÁTH D , HORVÁTH F , et al . Utilizing source code embeddings to identify correct patches [C ] // 2nd International Workshop on Intelligent Bug Fixing (IBF) . London : IEEE , 2020 : 18 - 25 .

齐玉华 . 软件自动修复关键技术研究 [D ] . 长沙 : 国防科学技术大学 , 2013 .

QI Y H . Research on Key Technologies of Software Automatic Repair [D ] . Changsha : National University of Defense Technology , 2013 . (in Chinese)

CONNEAU A , KIELA D , SCHWENK H , et al . Supervised learning of universal sentence representations from natural language inference data [C ] // 2017 Conference on Empirical Methods in Natural Language Processing . Copenhagen : ACL , 2017 : 670 - 680 .

BOWMAN S R , ANGELI G , POTTS C , et al . A large annotated corpus for learning natural language inference [C ] // Proceedings of the Conference on Empirical Methods in Natural Language Processing . Lisbon : ACL , 2015 : 632 - 642 .

DURIEUX T , MADEIRAL F , MARTINEZ M , et al . Empirical review of Java program repair tools: A large-scale experiment on 2141 bugs and 23551 repair attempts [C ] // European Software Engineering Conference and Symposium on the Foundations of Software Engineering . Tallinn : ACM , 2019 : 302 - 313 .

TIAN H , LIU K , KABORÉ A K , et al . Evaluating representation learning of code changes for predicting patch correctness in program repair [C ] // 35th IEEE/ACM International Conference on Automated Software Engineering . Melbourne : IEEE , 2020 : 981 - 992 .

LIU K , WANG S , KOYUNCU A , et al . On the efficiency of test suite based program repair: A systematic assessment of 16 automated repair systems for java programs [C ] // 42nd International Conference on Software Engineering . Seoul : ACM , 2020 : 615 - 627 .

LE Q , MIKOLOV T . Distributed representations of sentences and documents [C ] // 31st International Conference on Machine Learning . New York : ACM , 2014 : 1188 - 1196 .

DAVIS J , GOADRICH M . The relationship between precision-recall and ROC curves [C ] // 23rd International Conference on Machine Learning . New York : ACM , 2006 : 233 - 240 .

DEVLIN J , CHANG M W , LEE K , et al . BERT: Pre-training of deep bidirectional transformers for language understanding [C ] // Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . Minneapolis : ACL , 2019 : 4171 - 4186 .

CER D , YANG Y , KONG S , et al . Universal sentence encoder for English [C ] // Conference on Empirical Methods in Natural Language Processing . Brussels : ACL , 2018 : 169 - 174 .

REIMERS N , GUREVYCH I . Sentence-bert: Sentence embeddings using siamese bert-networks [C ] // Conference on Empirical Methods in Natural Language Processing . Hong Kong : ACL , 2019 : 3980 - 3990 .

BUI N D Q , YU Y , JIANG L . Infercode: Self-supervised learning of code representations by predicting subtrees [C ] // 43rd International Conference on Software Engineering . Madrid : IEEE , 2021 : 1186 - 1197 .

黄颖 , 姜淑娟 , 蒋婷婷 . 结合Doc2Vec和BERT嵌入技术的补丁验证方法 [J ] . 计算机科学 , 2022 , 49 ( 11 ): 83 - 89 .

HUANG Y , JIANG S J , JIANG T T . Patch validation approach combining Doc2Vec and BERT embedding technologies [J ] . Computer Science , 2022 , 49 ( 11 ): 83 - 89 . (in Chinese)

XIONG Y , LIU X , ZENG M , et al . Identifying patch correctness in test-based program repair [C ] // International Conference on Software Engineering . New York : ACM , 2018 : 789 - 799 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于结构化特征重构的高光谱图像分类

基于改进时域多尺度散布熵与支持向量机的转辙机故障诊断

雷达极化域变焦角反组合体对抗方法：抗冲淡式干扰

基于小波变换和深度网络的着陆地貌图像分类

具有合适拒识机制的高正确识别率分类器设计