

浏览全部资源
扫码关注微信
1.北京信息科技大学计算机学院,北京 100101
2.南通大学计算机科学与技术学院,江苏南通 226019
Received:14 July 2022,
Revised:2023-02-24,
Published:25 January 2024
移动端阅览
王浩仁,崔展齐,岳雷,等.基于冗余覆盖信息约简的软件缺陷定位方法[J].电子学报,2024,52(01):324-337.
WANG Hao-ren, CUI Zhan-qi, YUE Lei, et al.Software Fault Localization Based on Redundant Coverage Information Reduction[J].Acta Electronica Sinica, 2024, 52(01): 324-337.
王浩仁,崔展齐,岳雷,等.基于冗余覆盖信息约简的软件缺陷定位方法[J].电子学报,2024,52(01):324-337. DOI:10.12263/DZXB.20220820
WANG Hao-ren, CUI Zhan-qi, YUE Lei, et al.Software Fault Localization Based on Redundant Coverage Information Reduction[J].Acta Electronica Sinica, 2024, 52(01): 324-337. DOI:10.12263/DZXB.20220820
软件规模和复杂程度的不断提高,为软件质量保障带来了严峻的挑战.软件缺陷定位是一种重要的软件质量保障技术,其中基于频谱的缺陷定位(Spectrum-based Fault Localization,SFL)是应用最为广泛的软件缺陷定位技术,其通过分析语句覆盖信息矩阵计算代码语句的可疑度值,并根据可疑度值定位缺陷所在语句.然而,语句覆盖信息矩阵中存在着严重的数据冗余问题,冗余的数据极大地影响了SFL的缺陷定位性能.以Defects4J数据集中395个程序的语句覆盖信息矩阵为例,在超过一半的语句覆盖信息矩阵中有90%的语句存在与其具有相同覆盖信息的语句.特征选择是常用的数据预处理技术,通过去除冗余和不相关特征来获取原始特征集中有价值的特征子集.因此,我们将语句覆盖信息矩阵作为原始特征集,将冗余覆盖信息约简建模为特征选择问题,提出了一种基于冗余覆盖信息约简的软件缺陷定位方法(Fault Localization based on Redundant coverage information Reduction,FLRR).首先,使用特征选择技术对语句覆盖信息和测试用例执行结果组成的语句覆盖信息矩阵进行约简,得到语句覆盖信息矩阵子集;然后,使用SFL计算语句覆盖信息矩阵子集中语句的可疑度值,并根据可疑度值对语句进行降序排列,以定位缺陷语句.本文使用六种常用的特征选择技术对语句覆盖信息矩阵进行特征选择和约简,以得到语句覆盖信息矩阵子集,并使用四种典型的SFL技术对语句覆盖信息矩阵子集中的语句进行缺陷定位.为评估FLRR的缺陷定位性能,本文使用
<math id="M1"><msub><mrow><mi>E</mi></mrow><mrow><mi mathvariant="normal">i</mi><mi mathvariant="normal">n</mi><mi mathvariant="normal">s</mi><mi mathvariant="normal">p</mi><mi mathvariant="normal">e</mi><mi mathvariant="normal">c</mi><mi mathvariant="normal">t</mi></mrow></msub><mo>@</mo><mi>n</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56713558&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56713555&type=
11.76866722
3.89466691
和MRR(Mean Reciprocal Rank)评价指标在基于Defects4J的数据集上与四种典型的SFL技术进行了对比实验.实验结果表明,FLRR能够有效提升SFL的缺陷定位性能.对于
<math id="M2"><msub><mrow><mi>E</mi></mrow><mrow><mi mathvariant="normal">i</mi><mi mathvariant="normal">n</mi><mi mathvariant="normal">s</mi><mi mathvariant="normal">p</mi><mi mathvariant="normal">e</mi><mi mathvariant="normal">c</mi><mi mathvariant="normal">t</mi></mrow></msub><mo>@</mo><mi>n</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56713579&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56713576&type=
11.76866722
3.89466691
指标,当
<math id="M3"><mi>n</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56713564&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56713581&type=
1.60866666
2.28600001
=1时,FLRR相比DStar、Ochiai、Barinel和OP2分别多定位到23条、26条、14条和13条缺陷语句,分别增加了69.70%、76.47%、45.16%和38.24%;对于MRR指标,FLRR相比DStar、Ochiai、Barinel和OP2分别提升了20.08%、24.94%、17.45%和19.15%.
As the scale and complexity of software increase
it becomes more difficult to ensure its quality and reliability. Some of the most important software quality and reliability assurance methods are software fault localization techniques
of which spectrum-based fault localization (SFL) is the most commonly used. SFL calculates the suspicious values of code statements by analyzing the statement coverage matrix
locating the faulty statements according to the suspicious values. However
the statement coverage matrix suffers from a serious redundancy problem
which severely impairs the fault localization performance of SFL. For instance
in more than half of the statement coverage matrices of 395 programs in the Defects4J dataset
there are other statements with the same coverage information for 90% of the statements. Feature selection
a data preprocessing technique
is often used to obtain valuable feature subsets by removing redundant and irrelevant features. We propose a software fault localization approach
based on redundant coverage information reduction (FLRR)
by taking the statement coverage matrix as the original feature set and modeling the reduction of redundant coverage information as a feature selection problem. First
feature selection techniques are applied to reduce the statement coverage matrix
which includes both statement coverage information and test case execution results
to obtain a subset of the matrix. Second
SFL is used to calculate the suspicious values of statements in the statement coverage matrix subset
and the statements are sorted in descending order according to their suspicious values. The method presented in this paper uses six common feature selection techniques to perform feature selection and reduction on the statement coverage matrix
to obtain the subset of the matrix
and then uses four typical SFL techniques to localize faulty statements in the subset. To evaluate the fault localization performance of FLRR
comparative experiments were conducted with four typical SFL techniques on the Defects4J dataset
using
<math id="M4"><msub><mrow><mi>E</mi></mrow><mrow><mi mathvariant="normal">i</mi><mi mathvariant="normal">n</mi><mi mathvariant="normal">s</mi><mi mathvariant="normal">p</mi><mi mathvariant="normal">e</mi><mi mathvariant="normal">c</mi><mi mathvariant="normal">t</mi></mrow></msub><mo>@</mo><mi>n</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56713584&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56713567&type=
11.76866722
3.89466691
and MRR (Mean Reciprocal Rank) as evaluation metrics. Experimental results show that FLRR can improve the fault localization performance of SFL. When compared with DStar
Ochiai
Barinel
and OP2
FLRR located 23
26
14
and 13 more faulty statements
improved
<math id="M5"><msub><mrow><mi>E</mi></mrow><mrow><mi mathvariant="normal">i</mi><mi mathvariant="normal">n</mi><mi mathvariant="normal">s</mi><mi mathvariant="normal">p</mi><mi mathvariant="normal">e</mi><mi mathvariant="normal">c</mi><mi mathvariant="normal">t</mi></mrow></msub><mo>@</mo><mi>n</mi><mtext> </mtext><mo stretchy="false">(</mo><mi>n</mi><mo>=</mo><mn mathvariant="normal">1</mn><mo stretchy="false">)</mo></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56713588&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56713570&type=
19.81200027
3.89466691
by 69.70%
76.47%
45.16%
and 38.24%
and improved MRR by 20.08%
24.94%
17.45%
and 19.15%
respectively.
WONG W E , GAO R Z , LI Y H , et al . A survey on software fault localization [J]. IEEE Transactions on Software Engineering , 2016 , 42 ( 8 ): 707 - 740 .
陈翔 , 鞠小林 , 万志 , 等 . 基于程序频谱的动态缺陷定位方法研究 [J]. 软件学报 , 2015 , 26 ( 2 ): 390 - 412 .
CHEN X , JU X L , WAN Z , et al . Review of dynamic fault localization approaches based on program spectrum [J]. Journal of Software , 2015 , 26 ( 2 ): 390 - 412 . (in Chinese)
PAPADAKIS M , LE TRAON Y . Metallaxis-FL: Mutation-based fault localization [J]. Software Testing, Verification and Reliability , 2015 , 25 ( 5/6/7 ): 605 - 628 .
HE T , WANG X M , ZHOU X C , et al . A software fault localization technique based on program mutations [J]. Chinese Journal of Computers , 2014 , 36 ( 11 ): 2236 - 2244 .
曹鹤玲 , 姜淑娟 . 基于Chameleon聚类分析的多错误定位方法 [J]. 电子学报 , 2017 , 45 ( 2 ): 394 - 400 .
CAO H L , JIANG S J . Multiple-fault localization based on chameleon clustering [J]. Acta Electronica Sinica , 2017 , 45 ( 2 ): 394 - 400 . (in Chinese)
王建峰 , 魏长安 , 盛云龙 , 等 . 基于错误交互集的组合测试软件故障定位方法 [J]. 电子学报 , 2014 , 42 ( 6 ): 1173 - 1178 .
WANG J F , WEI C A , SHENG Y L , et al . Locating errors in combinatorial testing using set of possible faulty interactions [J]. Acta Electronica Sinica , 2014 , 42 ( 6 ): 1173 - 1178 . (in Chinese)
WONG W E , DEBROY V , GAO R Z , et al . The DStar method for effective software fault localization [J]. IEEE Transactions on Reliability , 2014 , 63 ( 1 ): 290 - 308 .
JUST R , JALALI D , ERNST M D . Defects4J: A database of existing faults to enable controlled testing studies for Java programs [C]// Proceedings of the 2014 International Symposium on Software Testing and Analysis . New York : ACM , 2014 : 437 - 440 .
KOCHHAR P S , XIA X , LO D , et al . Practitioners’ expectations on automated fault localization [C]// Proceedings of the 25th International Symposium on Software Testing and Analysis . New York : ACM , 2016 : 165 - 176 .
JAIN A , ZONGKER D . Feature selection: Evaluation, application, and small sample performance [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 1997 , 19 ( 2 ): 153 - 158 .
CHANDRASHEKAR G , SAHIN F . A survey on feature selection methods [J]. Computers & Electrical Engineering , 2014 , 40 ( 1 ): 16 - 28 .
ZHOU Y Y , CHENG G , JIANG S Q , et al . Building an efficient intrusion detection system based on feature selection and ensemble classifier [J]. Computer Networks , 2020 , 174 : 107247 .
BREIMAN L . Random forests [J]. Machine Learning , 2001 , 45 ( 1 ): 5 - 32 .
JIANG W , ER G H , DAI Q H , et al . Similarity-based online feature selection in content-based image retrieval [J]. IEEE Transactions on Image Processing , 2006 , 15 ( 3 ): 702 - 712 .
ABREU R , ZOETEWEIJ P , GOLSTEIJN R , et al . A practical evaluation of spectrum-based fault localization [J]. Journal of Systems and Software , 2009 , 82 ( 11 ): 1780 - 1792 .
ABREU R , ZOETEWEIJ P , VAN GEMUND A J C . Spectrum-based multiple fault localization [C]// 2009 IEEE/ACM International Conference on Automated Software Engineering . Piscataway : IEEE , 2009 : 88 - 99 .
NAISH L , LEE H J , RAMAMOHANARAO K . A model for spectra-based software diagnosis [J]. ACM Transactions on Software Engineering and Methodology , 2011 , 20 ( 3 ): 1 - 32 .
BENESTY J , CHEN J D , HUANG Y T , et al . Pearson correlation coefficient [M]// Noise Reduction in Speech Processing . Berlin : Springer , 2009 : 1 - 4 .
FRASER A M , SWINNEY H L . Independent coordinates for strange attractors from mutual information [J]. Physical Review A , 1986 , 33 ( 2 ): 1134 - 1140 .
LOH W Y . Classification and regression trees [J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 2011 , 1 ( 1 ): 14 - 23 .
ROTHERMEL G , HARROLD M J , OSTRIN J , et al . An empirical study of the effects of minimization on the fault detection capabilities of test suites [C]// Proceedings of the International Conference on Software Maintenance . Bethesda : IEEE , 1998 : 34 - 43 .
PEARSON S , CAMPOS J , JUST R , et al . Evaluating and improving fault localization [C]// 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE) . Piscataway : IEEE , 2017 : 609 - 620 .
ZOU D M , LIANG J J , XIONG Y F , et al . An empirical study of fault localization families and their combinations [J]. IEEE Transactions on Software Engineering , 2021 , 47 ( 2 ): 332 - 347 .
WEN M , CHEN J J , TIAN Y Q , et al . Historical spectrum based fault localization [J]. IEEE Transactions on Software Engineering , 2021 , 47 ( 11 ): 2348 - 2368 .
JONES J A , HARROLD M J , STASKO J . Visualization of test information to assist fault localization [C]// Proceedings of the 24th International Conference on Software Engineering . New York : ACM , 2002 : 467 - 477 .
姜淑娟 , 张旭 , 王荣存 , 等 . 基于路径分析和信息熵的错误定位方法 [J]. 软件学报 , 2021 , 32 ( 7 ): 2166 - 2182 .
JIANG S J , ZHANG X , WANG R C , et al . Fault localization approach based on path analysis and information entropy [J]. Journal of Software , 2021 , 32 ( 7 ): 2166 - 2182 . (in Chinese)
张卓 , 雷晏 , 毛晓光 , 等 . 基于词频-逆文件频率的错误定位方法 [J]. 软件学报 , 2020 , 31 ( 11 ): 3448 - 3460 .
ZHANG Z , LEI Y , MAO X G , et al . Fault localization approach using term frequency and inverse document frequency [J]. Journal of Software , 2020 , 31 ( 11 ): 3448 - 3460 . (in Chinese)
XIE H , LEI Y , YAN M , et al . A universal data augmentation approach for fault localization [C]// Proceedings of the 44th International Conference on Software Engineering . New York : ACM , 2022 : 48 - 60 .
何径舟 , 王厚峰 . 基于特征选择和最大熵模型的汉语词义消歧 [J]. 软件学报 , 2010 , 21 ( 6 ): 1287 - 1295 .
HE J Z , WANG H F . Chinese word sense disambiguation based on maximum entropy model with feature selection [J]. Journal of Software , 2010 , 21 ( 6 ): 1287 - 1295 . (in Chinese)
陈友 , 程学旗 , 李洋 , 等 . 基于特征选择的轻量级入侵检测系统 [J]. 软件学报 , 2007 , 18 ( 7 ): 1639 - 1651 .
CHEN Y , CHENG X Q , LI Y , et al . Lightweight intrusion detection system based on feature selection [J]. Journal of Software , 2007 , 18 ( 7 ): 1639 - 1651 . (in Chinese) .
唐成华 , 刘鹏程 , 汤申生 , 等 . 基于特征选择的模糊聚类异常入侵行为检测 [J]. 计算机研究与发展 , 2015 , 52 ( 3 ): 718 - 728 .
TANG C H , LIU P C , TANG S S , et al . Anomaly intrusion behavior detection based on fuzzy clustering and features selection [J]. Journal of Computer Research and Development , 2015 , 52 ( 3 ): 718 - 728 . (in Chinese)
LEI Y , XIE H , ZHANG T , et al . Feature-FL: Feature-based fault localization [J]. IEEE Transactions on Reliability , 2022 , 71 ( 1 ): 264 - 283 . .
0
Views
0
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621