电子学报 ›› 2018, Vol. 46 ›› Issue (1): 135-144.DOI: 10.3969/j.issn.0372-2112.2018.01.019

• 学术论文 • 上一篇    下一篇

基于三支决策的不平衡数据过采样方法

胡峰, 王蕾, 周耀   

  1. 计算智能重庆市重点实验室(重庆邮电大学), 重庆 400065
  • 收稿日期:2016-05-10 修回日期:2016-10-31 出版日期:2018-01-25
    • 作者简介:
    • 胡峰,男,1978年7月出生,湖北天门人,教授、硕士生导师.2000年、2003年和2011年分别在重庆大学、武汉大学和西南交通大学获得理学学士、工学硕士和工学博士学位,现为重庆邮电大学教师.主要研究方向为数据挖掘、Rough集和粒计算等.E-mail:hufeng@cqupt.edu.cn;王蕾,男,1989年出生于山东德州,重庆邮电大学在读硕士研究生.主要研究方向为数据挖掘、三支决策、Rough集.
    • 基金资助:
    • 国家自然科学基金 (No.61309014,No.61379114,No.61472056); 教育部人文社科规划 (No.15XJA630003); 重庆市基础与前沿研究计划 (No.cstc2013jcyjA40063,No.cstc2014jcyjA40049); 重庆市教委科学技术研究 (No.KJ1500416)

An Oversampling Method for Imbalance Data Based on Three-Way Decision Model

HU Feng, WANG Lei, ZHOU Yao   

  1. Chongqing Key Laboratory of Computational Intelligence(Chongqing University of Posts and Telecommunications), Chongqing 400065, China
  • Received:2016-05-10 Revised:2016-10-31 Online:2018-01-25 Published:2018-01-25
    • Supported by:
    • National Natural Science Foundation of China (No.61309014, No.61379114, No.61472056); Humanities and Social Science Program of Ministry of Education (No.15XJA630003); Chongqing Research Program of Basic and Frontier Technology (No.cstc2013jcyjA40063, No.cstc2014jcyjA40049); Science and Technology Research of Chongqing Municipal Education Commission (No.KJ1500416)

摘要: 采样是解决不平衡数据分类问题的一个有效途径.文中结合三支决策理论,根据样本分布将样本划分成三个区域:正域、边界域和负域;在此基础上,分别对边界域和负域中的小类样本进行不同的过采样处理,提出了一种基于三支决策的不平衡数据过采样算法(TWD-IDOS算法).实验结果表明,在C4.5、KNN和CART等分类器上,文中提出的算法能有效解决不平衡数据的二分类问题,在Recall、F-value、AUC等指标上优于文献中的过采样算法.

关键词: 三支决策, 邻域粗糙集, 边界采样, 不平衡数据, SMOTE

Abstract: Sampling is an effective way to solve the problem of unbalanced data classification. According to the distribution of samples, we employ the three-way decision model to divide the universe into three parts:positive region, boundary region and negative region. After that, we oversample the minority class samples in boundary region and negative region respectively. Then, a novel oversampling algorithm for imbalance data based on three-way decision model, namely TWD-IDOS, is developed. The experimental results show that the proposed method can effectively solve the two-class classification problems of imbalanced data and has a better performance in such measures (Recall, F-value, AUC) on C45, KNN and CART classifiers than other oversampling methods.

Key words: three-way decision, neighborhood rough set, boundary sampling, imbalanced data, SMOTE

中图分类号: