Outlier Detection Based on Reversed K-Nearest Neighborhood MST of Relative Distance Measure

YANG Xiao-ling, FENG Shan, YUAN Zhong

ACTA ELECTRONICA SINICA ›› 2020, Vol. 48 ›› Issue (5) : 937-945.

PDF(1702 KB)
CIE Homepage  |  Join CIE  |  Login CIE  |  中文 
PDF(1702 KB)
ACTA ELECTRONICA SINICA ›› 2020, Vol. 48 ›› Issue (5) : 937-945. DOI: 10.3969/j.issn.0372-2112.2020.05.014

Outlier Detection Based on Reversed K-Nearest Neighborhood MST of Relative Distance Measure

  • YANG Xiao-ling1,2, FENG Shan1, YUAN Zhong2
Author information +

Abstract

For outlier detection difficulty of data sets with complex distribution and various types of outliers, a new outlier detection method based on reversed k-nearest neighborhood MST of relative distance measure is proposed. Firstly, relative distance of object is defined with the combination of classical distance, local density and neighborhood of object, which can be used to detect global outliers and local outliers both. Secondly, on basis of minimum spanning tree structure, by tactics of maximum-edge-cutting, outliers and outlier clusters can be obtained. Finally, experiments of synthetic and UCI data sets show that the new algorithm is much more correct and effective. It is a new effective way for detecting outliers of data sets with abnormal distribution and diversity outlier types.

Key words

outlier / outlier cluster / reversed nearest neighborhood / MST(minimum spanning tree) / relative distance metric

Cite this article

Download Citations
YANG Xiao-ling, FENG Shan, YUAN Zhong. Outlier Detection Based on Reversed K-Nearest Neighborhood MST of Relative Distance Measure[J]. Acta Electronica Sinica, 2020, 48(5): 937-945. https://doi.org/10.3969/j.issn.0372-2112.2020.05.014

References

[1] Xue Z X,Shang Y L,Feng A F.Semi-supervised outlier detection based on fuzzy rough C-means clustering[J].Mathematics and Computers in Simulation,2010,80(9):1911-1921.
[2] Han J W,Kamber M,Pei J.Data Mining:Concepts and Techniques[M].San Francisco:Morgan Kaufmann,2011.543-584.
[3] 薛安荣,姚林,鞠时光,陈伟鹤,马汉达.离群点挖掘方法综述[J].计算机科学,2008,35:11-18. Xue A R,Yao L,Ju S G,Chen W H,Ma H D.Survey of outlier mining[J].Computer Science,2008,35:11-18.(in Chinese)
[4] Markus G,Seiichi U,Dongxiao Z.A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data[J].PLOS ONE,2016,11(4):1-31.
[5] Aggarwal,Charu C.Outlier Analysis[M].Berlin,Germany:Second Edition,Springer,2017.
[6] Wu D F.A regression sequences based method for high dimensional outlier detection[J].Journal of Discrete Mathematical Sciences and Cryptography,2017,20(4):931-943.
[7] Goldstein M,Dengel A.Histogram-based outlier score(HBOS):A fast unsupervised anomaly detection algorithm[A].Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012)[C].Germany:2012.59-63.
[8] Ramaswamy S.Rastogi R.Shim K.Efficient algorithms for mining outliers from large data sets[A].ACM Sigmod International Conference on Management of Data[C].New York,USA:ACM,2000.427-438.
[9] Jiang F,Sui Y F,Cao C G.A rough set approach to outlier detection[J].International Journal of General Systems,2008,37(5):519-536.
[10] Yuan Z,Zhang X Y,Feng S.Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures[J].Expert Systems with Applications,2018,112(12):243-257.
[11] Chen Y M,Miao D Q,Zhang H Y.Neighborhood outlier detection[J].Expert Systems with Applications,2010,37(12):8745-8749.
[12] Wang X,Wang X L,Ma Y,et al.A fast MST-inspired kNN-based outlier detection method[J].Information Systems,2015,48(3):89-112.
[13] Tang X Q,Zhu P.Hierarchical clustering problems and analysis of fuzzy proximity relation on granular space[J].IEEE Transactions on Fuzzy Systems,2013,21(5):814-824.
[14] Yamanishi K,Takeuchi J I,Williams G,et al.On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms[J].Data Mining and Knowledge Discovery,2004,8(3):275-300.
[15] Wu N,Zhang J.Factor-analysis based anomaly detection and clustering[J].Decision Support Systems,2006,42(1):375-389.
[16] Knorr E M,Ng R T.Algorithms for mining distance-based outliers in large datasets[A].Proceedings of the 24th VLDB Conference[C].New York,USA:VLDB,1998.392-403.
[17] Ramaswamy S,Rastogi R,Kyuseok S.Efficient algorithms for mining outliers from large data sets[A].Proceedings of the 2000 ACM SIGMOD international conference on Management of data[C].New York,USA:ACM,2000.427-438.
[18] Angiulli F,Pizzuti C.Fast outlier detection in high-dimensional spaces[A].Proceeding of 6th European Conference on Principles of Data Mining and Knowledge Discovery[C].Finland (Helsinki):ACM,2002.15-26.
[19] Wu M,Jermaine C.Outlier Detection by sampling with accuracy guarantees[A].Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C].New York,USA:ACM,2006.767-772.
[20] 朱庆生,唐汇,冯骥.一种基于自然最近邻的离群检测算法[J].计算机科学,2014,41(3):276-278. Zhu Q S,Tang H,Feng J.Outlier detection algorithm based on natural nearest neighbor[J].Computer Science,2014,41(3):276-278.(in Chinese)
[21] Breunig M M,Kriegel H P,Ng R T,Sander J.LOF:Identifying density-based local outliers[A].ACM Sigmod International Conference on Management of Data[C].New York,USA:ACM,2000.93-104.
[22] 胡彩平,秦小麟.一种基于密度的局部离群点检测算法DLOF[J].计算机研究与发展,2010,47(12):2110-2116. Hu C P,Qin X L.A density-based local outlier detecting algorithm[J].Journal of Computer Research and Development,2010,47(12):2110-2116.(in Chinese)
[23] 王敬华,赵新想,张国燕.NLOF:一种新的基于密度的局部离群点检测算法[J].计算机科学,2013,40(8):181-185. Wang J H,Zhao X X,Zhang G Y.NLOF:A new density-based local outlier detecting algorithm[J].Computer Science.2013,40(8):181-185.(in Chinese)
[24] Jin W,Anthony K H,Han J W,Wang W.Ranking outliers using symmetric neighborhood relationship[A].Pacific-Asia Conference on Knowledge Discovery and Data Mining[C].Berlin,GER:Springer,2006.577-593.
[25] Radovanovic M,Nanopoulos A,Ivanovic M.Reverse nearest neighbors in unsupervised distance-based outlier detection[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(5):1369-1382.
[26] 朱利,邱媛媛,于帅,原盛.一种基于快速k-近邻的最小生成树离群点检测方法[J].计算机学报,2017,40(12):2857-2870. Zhu L,Qiu Y Y,Yu S,Yuan S.A fast kNN-based MST outlier detection method[J].Chinese Journal of Computers,2017,40(12):2857-2870.(in Chinese)
[27] Zhu Q S,Fan X G,Feng J.Outlier detection based on K-Neighborhood MST[A].IEEE International Conference on Information Reuse and Integration[C].Las Vegas,USA:IEEE,2015.718-724.
[28] Alsuwaiyel M H.Algorithms Design Techniques and Analysis[M].Beijing:Publishing House of Electronics Industry,2013.239-246.
[29] Bay S D.The UCI KDDN repository[DB/OL].http://kdd.Ics.Uci.edu.2011-10-15.
[30] Harkin S,He H X,Williams G J,et al.Outlier detection using replicator neural networks[A].Proc of the 4th Int Conf on Data Warehousing and Knowledge Discovery[C].Aix-en-Provence:Springer-Verlag,2002.170-180.

Funding

National Natural Science Foundation of China (No.61673285, No.61976182, No.61572406); Youth Science and Technology Fund of Sichuan Provicne (No.2017JQ0046); Key Program of Sichuan Province International Science and Technology Innovation Cooperation Project (No.2019YFH0097)
PDF(1702 KB)

1058

Accesses

0

Citation

Detail

Sections
Recommended

/