电子学报 ›› 2016, Vol. 44 ›› Issue (1): 115-122.DOI: 10.3969/j.issn.0372-2112.2016.01.017

• 学术论文 • 上一篇    下一篇

基于迁移学习的软件缺陷预测

程铭1,2, 毋国庆1,2, 袁梦霆1,2   

  1. 1. 武汉大学计算机学院, 湖北武汉 430072;
    2. 武汉大学软件工程国家重点实验室, 湖北武汉 430072
  • 收稿日期:2014-06-06 修回日期:2015-05-14 出版日期:2016-01-25
    • 作者简介:
    • 程 铭 男,1985年生于河南郑州,武汉大学计算机学院博士研究生,研究方向:软件工程、缺陷预测、机器学习. E-mail:chengming@whu.edu.cn 毋国庆 男,1954年生,教授,博士生导师,研究方向:软件工程、软件演化. E-mail:wgq@whu.edu.cn
    • 基金资助:
    • 国家自然科学基金 (No.91118003,No.61003071); 深圳战略性新兴产业发展专项资金 (No.JCYJ20120616135936123)

Transfer Learning for Software Defect Prediction

CHENG Ming1,2, WU Guo-qing1,2, YUAN Meng-ting1,2   

  1. 1. School of Computer, Wuhan University, Wuhan, Hubei 430072, China;
    2. State Key Lab of Software Engineering, Wuhan University, Wuhan, Hubei 430072, China
  • Received:2014-06-06 Revised:2015-05-14 Online:2016-01-25 Published:2016-01-25
    • Supported by:
    • National Natural Science Foundation of China (No.91118003, No.61003071); Special Fund for the Development of Strategic Emerging Industries in Shenzhen,  Guangdong Province (No.JCYJ20120616135936123)

摘要:

传统软件缺陷预测方法在解决跨项目缺陷预测过程中适应能力不足,主要是因为源项目和目标项目之间存在不同的特征分布.为了解决这个问题,提出一种新的加权贝叶斯迁移学习算法,算法首先收集训练数据和测试数据的特征信息,然后计算特征差异,将不同项目数据之间差异转化为训练数据权重,最后基于这些权重数据建立预测模型.在8个开源项目数据集上进行实验比较,实验结果表明与其他方法相比本文方法显著提高跨项目缺陷预测性能.

关键词: 软件缺陷预测, 迁移学习, 机器学习, 朴素贝叶斯

Abstract:

The traditional software defect prediction methods have weak adaptive ability for cross-project defect prediction, largely because of feature distribution differences between the source and target projects.In order to resolve this problem, we propose a novel weighted naive Bayes transfer learning algorithm.Firstly, the feature information of the test data and training data are collected;next, our solution computes feature differences, and transfers cross-project data differences into the weights of the training data;finally, on these weighted data, the defect prediction model is built.Our experiments are conducted on eight open-source projects, and experimental results demonstrate that our method significantly improves cross-project defect prediction performance, compared to other methods.

Key words: software defect prediction, transfer learning, machine learning, naive Bayes

中图分类号: