1. 国家行政学院电子政务研究中心,北京,100089
2. 北京科技大学计算机与通信工程学院,北京,100083
3. 中国科学院信息工程研究所,北京,100093
4. 北京联合大学信息学院,北京,100101
5. 国家行政学院电子政务研究中心,北京,100089
6. 北京科技大学计算机与通信工程学院,北京,100083
7. 中国科学院信息工程研究所,北京,100093
8. 北京联合大学信息学院,北京,100101
纸质出版:2014
移动端阅览
翟云, 王树鹏, 马楠, 等. 基于单边选择链和样本分布密度融合机制的非平衡数据挖掘方法[J]. 电子学报, 2014,42(7):1311-1319.
ZHAI Yun, WANG Shu-peng, MA Nan, et al. A Data Mining Method for Imbalanced Datasets Based on One-Sided Link and Distribution Density of Instances[J]. Acta Electronica Sinica, 2014, 42(7): 1311-1319.
翟云, 王树鹏, 马楠, 等. 基于单边选择链和样本分布密度融合机制的非平衡数据挖掘方法[J]. 电子学报, 2014,42(7):1311-1319. DOI: 10.3969/j.issn.0372-2112.2014.07.011.
ZHAI Yun, WANG Shu-peng, MA Nan, et al. A Data Mining Method for Imbalanced Datasets Based on One-Sided Link and Distribution Density of Instances[J]. Acta Electronica Sinica, 2014, 42(7): 1311-1319. DOI: 10.3969/j.issn.0372-2112.2014.07.011.
非平衡数据集分类问题是机器学习领域的重大挑战性难题.针对该难题,传统的少数类样本合成技术(Synthetic Minority Over-Sampling Technique,SMOTE)已成为一种有力手段并得到广泛采用.但在新样本生成过程中,SMOTE利用所有少数类样本合成新样本,由此产生过拟合瓶颈.为更好地解决该问题,提出了一种基于单边选择链和样本分布密度的非平衡数据挖掘新方法(One-Sided Link Distribution Density-SMOTE,OSLDD-SMOTE).OSLDD-SMOTE通过单边选择链遴选出处于分类边界的少数类样本,根据这些样本的动态分布密度生成新样本.进而分析了样本合成度对节点数目和对少数类精度的影响;基于G-mean、F-measure和AUC三个指标综合比较了OSLDD-SMOTE与其他同类方法的分类性能.实验结果表明,OSLDD-SMOTE有效提高了少数类样本的分类准确率.
Classification in imbalanced datasets poses a great challenge to machine learning region
where the synthetic minority over-sampling technique(SMOTE)has become a powerful means and widely adopted as an effective method.But in generating new instances
SMOTE uses all instances in minority class such that it takes with over-generalization.To better solve the problem
a data mining method for imbalanced datasets based on one-sided link and distribution density of the minority(OSLDD-SMOTE)is proposed in this paper.OSLDD-SMOTE firstly selects the minority near the classification boundary using the one-sided link
then generates new instances with SMOTE based on the dynamic distribution density of these instances.Effects of synthetic degree on new generated instances and accuracy of the minority are respectively compared with the OSLDD-SMOTE
SMOTE
Borderline-SMOTE and Surrounding-SMOTE method.Furthermore
from the simulation results with 8 UCI datasets
our proposed method has the most accurate and robust performance on the G-mean
F-measure and AUC metrics.
0
浏览量
2
下载量
10
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621