电子学报 ›› 2018, Vol. 46 ›› Issue (2): 347-357.DOI: 10.3969/j.issn.0372-2112.2018.02.013

• 学术论文 • 上一篇    下一篇

面向大规模噪声数据的软性核凸包支持向量机

顾晓清1,2, 倪彤光2, 姜志彬1, 王士同1   

  1. 1. 江南大学数字媒体学院, 江苏无锡 214122;
    2. 常州大学信息科学与工程学院, 江苏常州 213164
  • 收稿日期:2016-10-24 修回日期:2017-10-17 出版日期:2018-02-25
    • 通讯作者:
    • 倪彤光
    • 作者简介:
    • 顾晓清,女,1981年出生,江苏常州人,2017年获江南大学博士学位,现任常州大学讲师,研究方向为模式识别,模糊系统.E-mail:czxqgu@163.com;姜志彬,男,1991年出生,山东烟台人,江南大学博士研究生,主要研究方向为模式识别.;王士同,男,1964年出生,江苏扬州人,江南大学教授、博士生导师,主要从事人工智能、模式识别、模糊系统、医学图像处理和生物信息学等方面的研究工作.
    • 基金资助:
    • 国家自然科学基金 (No.61572236,No.61572085); 江苏省自然科学基金 (No.BK20160187)

Soft Kernel Convex Hull Support Vector Machine for Large Scale Noisy Datasets

GU Xiao-qing1,2, NI Tong-guang2, JIANG Zhi-bin1, WANG Shi-tong1   

  1. 1. School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China;
    2. School of Information Science and Engineering, Changzhou University, Changzhou, Jiangsu 213164, China
  • Received:2016-10-24 Revised:2017-10-17 Online:2018-02-25 Published:2018-02-25
    • Corresponding author:
    • NI Tong-guang
    • Supported by:
    • National Natural Science Foundation of China (No.61572236, No.61572085); Natural Science Foundation of Jiangsu Province,  China (No.BK20160187)

摘要: 现有的面向大规模数据分类的支持向量机(support vector machine,SVM)对噪声样本敏感,针对这一问题,通过定义软性核凸包和引入pinball损失函数,提出了一种新的软性核凸包支持向量机(soft kernel convex hull support vector machine for large scale noisy datasets,SCH-SVM).SCH-SVM首先定义了软性核凸包的概念,然后选择出能代表样本在核空间几何轮廓的软性核凸包向量,再将其对应的原始空间样本作为训练样本并基于pinball损失函数来寻找两类软性核凸包之间的最大分位数距离.相关理论和实验结果亦证明了所提分类器在训练时间,抗噪能力和支持向量数上的有效性.

关键词: 大规模数据, 噪声, 软性核凸包, pinball损失函数, 分类

Abstract: Current support vector machines (SVMs) for large-scale datasets classification problems are almost sensitive to noises. To overcome this problem, a new soft kernel convex hull support vector machine called SCH-SVM is proposed based on the soft kernel convex hull and pinball loss function. SCH-SVM extracts the soft convex hull vectors in the kernel space, which can represent geometric profile of data in the kernel space. Then SCH-SVM represents the original samples which projected as the soft convex hull vectors for the training samples, and finds the maximum quantile distance between soft kernel convex hulls belonging to two classes by using pinball loss function. Theoretical analysis and numerical experiments show that SCH-SVM has distinctive ability of training time, noise resistibility, and the number of support vectors.

Key words: large scale datasets, noise, soft kernel convex hull, pinball loss function, classification

中图分类号: