电子学报 ›› 2017, Vol. 45 ›› Issue (12): 2978-2986.DOI: 10.3969/j.issn.0372-2112.2017.12.021

• 学术论文 • 上一篇    下一篇

耦合负类样本裁剪与非对称错分惩罚的非均衡SVM算法

高雷阜, 赵世杰, 于冬梅, 徒君   

  1. 辽宁工程技术大学优化与决策研究所, 辽宁阜新 123000
  • 收稿日期:2016-07-20 修回日期:2016-10-14 出版日期:2017-12-25
    • 作者简介:
    • 高雷阜,男,1963年出生,辽宁阜新人,辽宁工程技术大学教授,博士生导师,主要研究领域为最优化理论与应用、非线性动力系统等.E-mail:gaoleifu@163.com;赵世杰,男,1987年出生,山东五莲人,辽宁工程技术大学博士研究生,主要研究领域为人工智能与数据挖掘、优化与管理决策等.E-mail:zhao2008shijie@126.com;于冬梅,女,1986出生,辽宁鞍山人,博士,辽宁工程技术大学讲师,主要研究领域为最优化理论与应用;徒君,男,1982出生,安徽全椒人,博士,辽宁工程技术大学讲师,主要研究领域为智能算法、供应链管理等.
    • 基金资助:
    • 教育部高等学校博士学科点专项科研基金联合资助项目 (No.20132121110009); 国家自然科学基金青年基金项目 (No.51704140); 辽宁省教育厅基金项目 (No.L2015208,No.LJYL043)

Unbalanced Support Vector Machine Coupling Negative-Samples Cutting with Asymmetric Misclassification Cost

GAO Lei-fu, ZHAO Shi-jie, YU Dong-mei, TU Jun   

  1. Institute of Optimization and Decision, Liaoning Technical University, Fuxin, Liaoning 123000, China
  • Received:2016-07-20 Revised:2016-10-14 Online:2017-12-25 Published:2017-12-25

摘要: 针对标准支持向量机(SVM)识别非均衡数据往往会出现最优超平面倾向性和正类样本大量错分的现象,探讨SVM识别非均衡数据失效的原因及对策;考虑到SVM最优超平面仅由少量支持向量完全决定的特性,提出一种基于负类边界样本裁剪策略的SVM数学模型.鉴于该模型需经多次负类数据的"训练-裁剪"过程才能较好地识别正类样本且较为费时,以等效的一次性裁掉更多样本的裁截面技术作为替代,提出一种耦合负类样本裁剪与非对称错分惩罚的非均衡SVM算法,并利用改进正余弦优化算法优化裁剪偏移量以提高算法的非均衡数据处理能力.数值实验结果验证了裁剪偏移量的优化必要性、改进正余弦优化算法的较强优化性能和改进SVM算法对非均衡数据的较好识别性能.

关键词: 非均衡数据, 支持向量机, 边界样本, 裁截超平面, 非对称错分惩罚, 正余弦优化算法

Abstract: Optimal hyperplane tendency and a large number of positive sample misclassifications often appear when the standard support vector machine (SVM) is employed to classify unbalanced data.So several causes and corresponding countermeasures for the perspective of SVM misclassifying unbalanced data are discussed.Considering the characteristics of SVM that optimal hyperplane is only decided by a small amount of support vectors,a novel SVM mathematical model based on negative boundary sample cutting strategy is constructed.However,this model has better recognition performance on positive samples only when the "training-cutting" step of negative samples is carried out many times,which is a time-consuming process.To replace it with the equivalent cutting hyperplane technique which can cut more negative samples at one time,an unbalanced SVM algorithm coupling negative-samples cutting with asymmetric misclassification cost is proposed.To further enhance the classification ability of this algorithm on unbalanced data,an improved sine cosine algorithm (ISCA) is presented to optimize the biased constant of the cutting hyperplane.Experimental results verify the optimized necessity of the biased constant of the cutting hyperplane,the advanced optimization performance of ISCA algorithm and the outstanding recognition performance of the proposed algorithm on unbalanced datasets,respectively.

Key words: unbalanced data, support vector machine, boundary sample, cutting hyperplane, asymmetric misclassification cost, sine cosine algorithm

中图分类号: