电子学报 ›› 2018, Vol. 46 ›› Issue (11): 2725-2732.DOI: 10.3969/j.issn.0372-2112.2018.11.20

• 学术论文 • 上一篇    下一篇

基于密度敏感最大软间隔SVDD不均衡数据分类算法

陶新民, 李晨曦, 沈微, 常瑞, 王若彤, 刘艳超   

  1. 东北林业大学工程技术学院, 黑龙江哈尔滨 150040
  • 收稿日期:2017-09-08 修回日期:2017-12-12 出版日期:2018-11-25
    • 作者简介:
    • 陶新民 男,1973年生.博士,教授,主要研究领域为智能信号处理、故障诊断、模式识别.E-mail:taoxinmin@nefu.edu.cn;李晨曦 女,1993年生.硕士研究生,研究方向为故障诊断、模式识别.E-mail:chenxili0613@163.com
    • 基金资助:
    • 中央高校基本科研业务费专项资金 (No.2572017EB02,No.2572017CB07); 东北林业大学双一流科研启动基金 (No.411112438); 哈尔滨市科技局创新人才基金 (No.2017RAXXJ018); 国家自然科学基金 (No.31570547)

The SVDD Classifier for Unbalanced Data Based on Density-Sensitive and Maximum Soft Margin

TAO Xin-min, LI Chen-xi, SHEN Wei, CHANG Rui, WHANG Ruo-tong, LIU Yan-chao   

  1. College of Engineering and Technology, University of Northeast Forestry, Harbin, Heilongjiang 150040, China
  • Received:2017-09-08 Revised:2017-12-12 Online:2018-11-25 Published:2018-11-25

摘要: 为了提高传统支持向量域描述(C-SVDD)算法处理不均衡数据集的分类能力,提出一种基于密度敏感最大软间隔支持向量域描述(DSMSM-SVDD)算法.该算法通过对多数类样本引入相对密度来体现训练样本原始空间分布对求解最优分类界面的影响,通过在目标函数中增加最大软间隔正则项,使C-SVDD的分类边界向少数类偏移,进而提高算法分类性能.算法首先对每个多数类样本计算相对密度来反映样本的重要性,然后将训练样本输入到DSMSM-SVDD中实现数据分类.实验部分,讨论了算法参数间的关系及其对算法分类性能的影响,给出算法参数取值建议.最后通过与C-SVDD的对比实验,表明本文建议的算法在不均衡数据情况下的分类性能优于C-SVDD算法.

关键词: 支持向量域数据描述, 不均衡数据, 相对密度

Abstract: In order to improve the conventional support vector domain description(C-SVDD) algorithm's classification performance under unbalanced datasets,a novel maximum soft margin support vector domain descriptionalgorithm based on density sensitivity(DSMSM-SVDD) is presented.The relative density informationof the majority samples is introduced to reflect the impact of original training sample's space distribution on the optimal interface,by adding the maximum soft margin regularization term in the objective function,the classification boundary of the C-SVDD algorithm is shifted tominority classes,and consequently the classification performance of the proposed algorithm is significantly improved.Firstly,the relative density of each majority sample is calculated to reflect the importance of the training samples,and then the obtained training samples with relative density are input into the proposed DSMSM-SVDD algorithm to implement the classificationtask.In the experiments,the relationship of the parameters and the influence of the parameters on classification performance are investigated.Finally,the comparison results with C-SVDD algorithm demonstrate that the proposed algorithm is superior to the C-SVDD algorithm in the case of unbalanced data.

Key words: support vector domain description, unbalanced datasets, relative density

中图分类号: