电子学报

• 学术论文 • 上一篇    下一篇

代价敏感的半监督Laplacian支持向量机

万建武1,2, 杨明1,2, 陈银娟1   

  1. 1. 南京师范大学计算机科学与技术学院, 江苏南京 210046;
    2. 南京师范大学数学科学学院, 江苏南京 210046
  • 收稿日期:2011-05-26 修回日期:2011-12-23 出版日期:2012-07-25
    • 作者简介:
    • 万建武 男,1986年出生于江苏常州.现为南京师范大学应用数学专业硕博连读生.从事机器学习,模式识别方面的相关研究工作. E-mail:xiaowunjnu@163.com 杨 明 男,1964年11月生,博士,教授,博士生导师,安徽宁国人.主要研究领域为数据挖掘,机器学习,模式识别等.
    • 基金资助:
    • 国家自然科学基金 (No.60873176); 江苏省自然科学基金重点重大专项 (No.BK2011005); 江苏省自然科学基金 (No.BK2011782)

Cost Sensitive Semi-Supervised Laplacian Support Vector Machine

WAN Jian-wu1,2, YANG Ming1,2, CHEN Yin-juan1   

  1. 1. School of Computer Science and Technology, Nanjing Normal University, Nanjing, Jiangsu 210046, China;
    2. School of Mathematics Science, Nanjing Normal University, Nanjing, Jiangsu 210046, China
  • Received:2011-05-26 Revised:2011-12-23 Online:2012-07-25 Published:2012-07-25
    • Supported by:
    • National Natural Science Foundation of China (No.60873176); Major Project of Natural Science Foundation of Jiangsu Province,  China (No.BK2011005); Natural Science Foundation of Jiangsu Province,  China (No.BK2011782)

摘要: 代价敏感学习是机器学习领域的一个研究热点.在实际应用中,数据集往往是不平衡的,存在着大量的无标签样本,只有少量的有标签样本,并且存在噪声.虽然针对该情况的代价敏感学习方法的研究已取得了一定的进展,但还需要进一步的深入研究.为此,本文提出了一种基于代价敏感的半监督 Laplacian支持向量机.该模型在采用无标签扩展策略的基础上,将考虑了数据不平衡的错分代价融入到Laplacian支持向量机的经验损失和Laplacian正则化项中.考虑到噪声样本对决策平面的影响,本文定义了一种样本依赖的代价,对噪声样本赋予较低的权重.在7个UCI数据集和8个NASA软件数据集上的实验结果表明了本文算法的有效性.

关键词: 代价敏感学习, 半监督学习, Laplacian支持向量机

Abstract: Cost sensitive learning is the hot research area in machine learning.In practical real applications,the datasets are usually class-imbalanced,most of the samples are unlabeled,only a few of the samples are labeled,and noise samples are existed.Although some progress has been made in cost sensitive learning for such situation,it needs further solved.For that we propose a semi-supervised Laplacian support vector machine based on cost sensitive learning.On the basis of label propagation,the proposed model integrates the misclassification costs considering class-imbalance into the hinge loss and Laplacian regularization of the Laplacian support vector machine.At the same time,considering the effect on the decision hypersphere of the noise samples,we define an example-dependent cost which makes the weights of noise samples lower.The experimental results on 7 UCI,8 NASA datasets demonstrate the superiority of our proposed algorithm.

Key words: cost sensitive learning, semi-supervised learning, Laplacian support vector machine

中图分类号: