A One-Cluster Kernel PCM Based SVDD Method for Outlier Detection
YANG Jin-hong1, DENG Ting-quan1,2
1. College of Computer Science and Technology, Harbin Engineering University, Harbin, Heilongjiang 150001, China;
2. College of Science, Harbin Engineering University, Harbin, Heilongjiang 150001, China
针对支持向量数据描述(Support Vector Data Description,SVDD)的训练集中同时含有正常点和离群点的问题,为降低离群点对SVDD训练模型的不利影响,提出了一种基于单簇核可能性C-均值的SVDD离群点检测算法.本文算法通过单簇核聚类获得每个样本属于正常类的隶属度,将其作为每个样本属于目标类的置信度.将样本置信度引入到SVDD训练模型中,减弱低置信度样本在建立决策边界中的作用.实验表明,与已有的相关方法相比,本文方法能够显著改善SVDD的离群点检测效果.
In order to reduce the negative influence of outliers on the model of support vector data description (SVDD) when the training dataset contains both normal samples and outliers which are all labeled as target class,a one-cluster kernel possibilistic C-means based SVDD method for outlier detection is proposed.In this paper,each sample of the training dataset is assigned a confidence level based on the membership degree of each sample belonging to the normal class,which is obtained through the one-cluster kernel PCM clustering.The proposed algorithm incorporates the confidence levels into the training model to reduce the importance of the samples which have less confidence levels.The experimental results show that the proposal significantly improves the effect of outlier detection,compared with the existing SVDD-based outlier detection methods.
[1] Hawkins D M.Identication of Outliers[M].London:Chapman and Hall,1980.
[2] Hodge V J,Austin J.A survey of outlier detection methodologies[J].Artificial Intelligence Review,2004,22(2):85-126.
[3] Aral K D,Güvenir H A,et al.A prescription fraud detection model[J].Computer Methods & Programs in Biomedicine,2012,106(1):37-46.
[4] 江峰,杜军威,葛艳,等.基于粗糙集理论的序列离群点检测[J].电子学报,2011,39(2):345-350. Jiang F,Du J W,Ge Y,et al.Sequence outlier detection based on rough set theory[J].Acta Electronica Sinica,2011,39(2):345-350.(in Chinese)
[5] Yang Z,Wang S,Fu X.Pattern recognition-based chillers fault detection method using support vector data description (SVDD)[J].Applied Energy,2013,112(4):1041-1048.
[6] Shepherd J M,Burian S J.Detection of urban-induced rainfall anomalies in a major coastal city[J].Earth Interactions,2002,7(4):1-17.
[7] Prastawa M,Bullitt E,Ho S,et al.A brain tumor segmentation framework based on outlier detection[J].Medical Image Analysis,2004,8(3):275-283.
[8] Tax D M J,Duin R P W.Support vector domain description[J].Pattern Recognition Letters,1999,20(11-13):1191-1199.
[9] 方景龙,王万良,王兴起,等.求解多示例问题的支持向量数据描述方法[J].电子学报,2013,41(4):763-767. Fang J L,Wang W L,Wang X Q,et al.Support vector data description method for solving multiple instance problems[J].Acta Electronica Sinica,2013,41(4):763- 767.(in Chinese)
[10] 胡正平,冯凯.高维空间多分辨率最小生成树模型的自适应一类分类算法[J].自动化学报,2012,38(5):769-775. Hu Z P,Feng K.An adaptive one-class classification algorithm based on multi-resolution minimum spanning tree model in high-dimensional space[J].Acta Automatica Sinica,2012,38(5):769-775.(in Chinese)
[11] Liu B,Xiao Y,Yu P S,et al.An efficient approach for outlier detection with imperfect data labels[J].IEEE Transactions on Knowledge & Data Engineering,2014,26(7):1602-1616.
[12] Chen G,Zhang X,Wang Z J,et al.Robust support vector data description for outlier detection with noise or uncertain data[J].Knowledge-Based Systems,2015,90(C):129-137.
[13] Lee K,Kim D W,Lee K H,et al.Density-induced support vector data description[J].IEEE Transactions on Neural Networks,2007,18(1):284-289.
[14] Cha M,Kim J S,Baek J G.Density weighted support vector data description[J].Expert Systems with Applications,2014,41(7):3343-3350.
[15] Liu B,Xiao Y,Cao L,et al.SVDD-based outlier detection on uncertain data[J].Knowledge & Information Systems,2013,34(3):597-618.
[16] Bezdek J C.Pattern Recognition with Fuzzy Objective Function Algorithms[M].Plenum Press,1981.
[17] Krishnapuram R,Keller J M.A possibilistic approach to clustering[J].IEEE Transactions on Fuzzy Systems,1993,1(2):98-110.
[18] 陈斌,冯爱民,陈松灿,等.基于单簇聚类的数据描述[J].计算机学报,2007,30(8):1325-1332. Chen B,Feng A M,Chen S C,et al.One-cluster clustering based data description[J].Chinese Journal of Computers,2007,30(8):1325-1332.(in Chinese)
[19] Vapnik V N.The Nature of Statistical Learning Theory[M].Springer,2000.988-999.