电子学报 ›› 2020, Vol. 48 ›› Issue (5): 937-945.DOI: 10.3969/j.issn.0372-2112.2020.05.014

• 学术论文 • 上一篇    下一篇

基于相对距离的反k近邻树离群点检测

杨晓玲1,2, 冯山1, 袁钟2   

  1. 1. 四川师范大学数学科学学院, 四川成都 610066;
    2. 西南交通大学信息科学与技术学院, 四川成都 611756
  • 收稿日期:2019-07-04 修回日期:2019-09-09 出版日期:2020-05-25
    • 通讯作者:
    • 冯山
    • 作者简介:
    • 杨晓玲 女,1993年1月出生,四川乐山人,2016年和2019年分别在西华师范大学和四川师范大学获得理学学士和理学硕士学位.现为西南交通大学计算机科学与技术专业的博士生,从事粗糙集、离群点检测和数据挖掘有关的研究.E-mail:yangxlt1993@163.com
    • 基金资助:
    • 国家自然科学基金 (No.61673285,No.61976182,No.61572406); 四川省青年科技基金 (No.2017JQ0046); 四川省国际科技创新合作重点项目 (No.2019YFH0097)

Outlier Detection Based on Reversed K-Nearest Neighborhood MST of Relative Distance Measure

YANG Xiao-ling1,2, FENG Shan1, YUAN Zhong2   

  1. 1. School of Mathematical Science, Sichuan Normal University, Chengdu, Sichuan 610066, China;
    2. School of Information Science and Technology, Southwest Jiaotong University, Chengdu, Sichuan 611756, China
  • Received:2019-07-04 Revised:2019-09-09 Online:2020-05-25 Published:2020-05-25
    • Corresponding author:
    • FENG Shan
    • Supported by:
    • National Natural Science Foundation of China (No.61673285, No.61976182, No.61572406); Youth Science and Technology Fund of Sichuan Provicne (No.2017JQ0046); Key Program of Sichuan Province International Science and Technology Innovation Cooperation Project (No.2019YFH0097)

摘要: 针对分布复杂且离群类型多样的数据集进行离群检测困难的问题,提出基于相对距离的反k近邻树离群检测方法RKNMOD(Reversed K-Nearest Neighborhood).首先,将经典欧氏距离、对象局部密度和对象邻域结合,定义了对象的相对距离,能同时有效检出全局和局部离群点.其次,以最小生成树结构为基础,采取最大边切割法以快速分割离群点和离群簇.最后,人工合成数据集和UCI数据集试验均表明,新算法的检测准确率更高,为分布异常且离群类型多样的数据集的离群检测提供了一条有效的新途径.

关键词: 离群点, 离群簇, k近邻, 最小生成树, 相对距离度量

Abstract: For outlier detection difficulty of data sets with complex distribution and various types of outliers, a new outlier detection method based on reversed k-nearest neighborhood MST of relative distance measure is proposed. Firstly, relative distance of object is defined with the combination of classical distance, local density and neighborhood of object, which can be used to detect global outliers and local outliers both. Secondly, on basis of minimum spanning tree structure, by tactics of maximum-edge-cutting, outliers and outlier clusters can be obtained. Finally, experiments of synthetic and UCI data sets show that the new algorithm is much more correct and effective. It is a new effective way for detecting outliers of data sets with abnormal distribution and diversity outlier types.

Key words: outlier, outlier cluster, reversed nearest neighborhood, MST(minimum spanning tree), relative distance metric

中图分类号: