An Outlier Detection Method Based on Ranking and Clustering in Bi-typed Heterogeneous Network
PENG Tao1,2, YANG Ni-ya1, XU Yuan-bo1, WANG Bing-bing1, LIU Lu1
1. College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China;
2. Key Laboratory of Symbolic Computation and Knowledge Engineering(Jilin University), Ministry of Education, Changchun, Jilin 130012, China
Abstract:Mining the outliers that are different from normal data objects in the network is one of the important tasks in data mining.At present,the research aiming at outlier detection in bi-typed heterogeneous information network is relatively small.The methods which are applicable to homogeneous network can not be applied to bi-typed heterogeneous networks.Therefore,we propose a Rank-Kmeans Based Outlier detection method,called RKBOutlier,in heterogeneous information network.The two kinds of the objects and the connected semantic information are extracted from the heterogeneous information network.One type of the objects is regarded as the attribute objects,another type of the objects is regarded as the target objects.We perform cluster partitioning on target objects to detect the distribution of the attribute objects in each cluster.The objects which are abnormal at data distribution are considered to be the outliers.Ranking and clustering are combined to significantly improve the accuracy of clustering.The experimental results show that RKBOutlier can effectively detect outliers in bi-typed heterogeneous information network.
彭涛, 杨妮亚, 徐原博, 王冰冰, 刘露. 双类型异质网中基于排序和聚类的离群点检测方法[J]. 电子学报, 2018, 46(2): 281-288.
PENG Tao, YANG Ni-ya, XU Yuan-bo, WANG Bing-bing, LIU Lu. An Outlier Detection Method Based on Ranking and Clustering in Bi-typed Heterogeneous Network. Acta Electronica Sinica, 2018, 46(2): 281-288.
[1] Aggarwal C C,Yu P S.Outlier detection for high dimensional data[J].Acm Sigmod Record,2001,30(2):37-46.
[2] Koc L,Mazzuchi T A,Sarkani S.A network intrusion detection system based on a Hidden Naïve Bayes multiclass classifier[J].Expert Systems with Applications,2012,39:13492-13500.
[3] Kaganov A,Lakhany A,Chow P.FPGA acceleration of multifactor cdo pricing[J].ACM Transactions on Reconfigurable Technology and Systems (TRETS),2011,4(2):20-25.
[4] Kuklisova-Murgasova M,Quaghebeur G,Rutherford M A,et al.Reconstruction of fetal brain MRI with intensity matching and complete outlier removal[J].Medical Image Analysis,2012,16(8):1550-1564.
[5] Gunisetti L.Outlier detection and visualization of large datasets[A].Proceedings of the International Conference on Emerging Trends in Technology[C].New York:ACM,2011.522-524.
[6] Aktolga E,Ros I,Assogba Y.Detecting outlier sections in us congressional legislation[A].Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval,SIGIR 2011[C].New York:ACM,2011.235-244.
[7] Zimek A,Gaudet M,Campello R J G B,et al.Subsampling for efficient and effective unsupervised outlier detection ensembles[A].ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C].New York:ACM,2013.428-436.
[8] Zimek A,Campello R J G B,Sander J.Data perturbation for outlier detection ensembles[A].Proceedings of the 26th International Conference on Scientific and Statistical Database Management[C].New York:ACM,2014.13:1-12.
[9] Pillutla M R,Raval N,Bansal P,et al.LSH based outlier detection and its application in distributed setting[A].Proceedings of the 20th ACM International Conference on Information and Knowledge Management[C].New York:ACM,2011.2289-2292.
[10] 江峰,杜军威,眭跃飞等.基于边界和距离的离群点检测[J].电子学报,2010,38(3):700-705. Jiang Feng,Du Junwei,Mu Yuefei,et al.Outlier detection based on boundary and distance[J].Acta Electronica Sinica,2010,38(3):700-705.(in Chinese)
[11] 江峰,杜军威,葛艳等.基于粗糙集理论的序列离群点检测[J].电子学报,2011,39(2):345-350. Jiang Feng,Du Junwei,Ge Yan,et al.Sequence outlier detection based on rough set theory[J].Acta Electronica Sinica,2011,39(2):345-350.(in Chinese)
[12] Dalmia A,Gupta M,Varma V.Query-based graph cuboid outlier detection[A].Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015[C].New York:ACM,2015.705-712.
[13] Manish G,Gao J,Sun Y Z,et al.Integrating community matching and outlier detection for mining evolutionary community outliers[A].Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C].New York:ACM,2012.859-867.
[14] Basu S,Banerjee A,Mooney R J.Semi-supervised clustering by seeding[A].18th International Conference on Machine Learning[C].San Francisco,CA:Morgan Kaufmann,2002.27-34.
[15] Van d H M,Mandl R,Hulshoff P H.Normalized cut group clustering of resting-state FMRI data[J].Plos One,2008,3(4):e2001.
[16] Sun Y Z,Yu Y,Han J W.Ranking-based clustering of heterogeneous information networks with star network schema[A].Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C].New York:ACM,2009.797-806.
[17] Zhuang H,Zhang J,Brova G,et al.Mining query-based subnetwork outliers in heterogeneous information networks[A].IEEE International Conference on Data Mining[C].Piscataway,NJ:IEEE,2014.1127-1132.
[18] Gupta M,Gao J,Han J.Community distribution outlier detection in heterogeneous information networks[A].Joint European Conference on Machine Learning and Knowledge Discovery in Databases[C].Berlin:Springer,2013.557-573.
[19] Qi G J,Aggarwal C C,Huang T S.On clustering heterogeneous social media objects with outlier links[A].Proceedings of the 5th ACM International Conference on Web Search and Data Mining[C].New York:ACM,2012.553-562.
[20] Sun Y Z,Han J W,Zhao P X,et al.RankClus:Integrating clustering with ranking for heterogeneous information network analysis[A].Proceedings of the 12th International Conference on Extending Database Technology:Advances in Database Technology[C].New York:ACM,2009.565-576.
[21] Steinbach M,Karypis G,Kumar V.A comparison of document clustering techniques[A].KDD Workshop on Text Mining[C].Piscataway,NJ:IEEE,2000.400(1):525-526.
[22] Han J W,Kamber M,Pei J.Data Mining Concepts and Techniques[M].Third Edition,San Francisco,CA:Morgan Kaufmann,2012.102-120.
[23] Zhang K,Hutter M,Jin H.A new local distance-based outlier detection approach for scattered real-world data[A].Advances in Knowledge Discovery and Data Mining[C].Berlin:Springer,2009.813-822.
[24] Tzeng J Y,Byerley W,Devlin B,et al.Outlier detection and false discovery rates for whole-genome DNA matching[J].Journal of the American Statistical Association,2003.98(461):236-246.
[25] Croft W B,Metzler D,Strohman T.Search Engines:Information Retrieval in Practice[M].Reading:Addison-Wesley,2010.23-37.
[26] Ley M.The DBLP computer science bibliography:Evolution,research issues,perspectives[A].String Proceedings and Information Retrieval[C].Berlin:Springer,2002.1-10.
[27] Tang J,Zhang J,Yao L,et al.Arnetminer:Extraction and mining of academic social networks[A].Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C].New York:ACM,2008.990-998.