电子学报 ›› 2018, Vol. 46 ›› Issue (2): 281-288.DOI: 10.3969/j.issn.0372-2112.2018.02.004

• 学术论文 • 上一篇    下一篇

双类型异质网中基于排序和聚类的离群点检测方法

彭涛1,2, 杨妮亚1, 徐原博1, 王冰冰1, 刘露1   

  1. 1. 吉林大学计算机科学与技术学院, 吉林长春 130012;
    2. 符号计算与知识工程教育部重点实验室(吉林大学), 吉林长春 130012
  • 收稿日期:2016-09-26 修回日期:2016-11-14 出版日期:2018-02-25
    • 通讯作者:
    • 刘露
    • 作者简介:
    • 彭涛,男,1977年生,博士、教授.主要研究方向为数据挖掘、信息检索、机器学习.;杨妮亚,女,1992年生,硕士.主要研究方向为数据挖掘、机器学习.;徐原博,男,1990年生,博士.主要研究方向为数据挖掘、机器学习.;王冰冰,女,1990年生,硕士.主要研究方向为数据挖掘、搜索引擎.
    • 基金资助:
    • 国家自然科学基金 (No.60903098); 吉林大学研究生创新基金 (No.2016183,No.2016184)

An Outlier Detection Method Based on Ranking and Clustering in Bi-typed Heterogeneous Network

PENG Tao1,2, YANG Ni-ya1, XU Yuan-bo1, WANG Bing-bing1, LIU Lu1   

  1. 1. College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China;
    2. Key Laboratory of Symbolic Computation and Knowledge Engineering(Jilin University), Ministry of Education, Changchun, Jilin 130012, China
  • Received:2016-09-26 Revised:2016-11-14 Online:2018-02-25 Published:2018-02-25
    • Corresponding author:
    • LIU Lu
    • Supported by:
    • National Natural Science Foundation of China (No.60903098); Postgraduate Innovation Fund of Jilin University (No.2016183, No.2016184)

摘要: 挖掘隐藏在网络中不同于正常数据对象的离群点是数据挖掘的重要任务之一.目前,针对双类型异质信息网络离群点检测的研究工作相对较少,原本适用于同质网络的离群点检测方法将很难适用于双类型异质网络.为此,提出了异质信息网络中基于排序和聚类的离群点检测方法(RKBOutlier).从异质信息网络中抽取两种类型的对象以及链接两种对象的语义信息,将待检测的数据作为属性对象,将另一类型数据作为目标对象,对目标对象进行聚类来检测属性对象在各个聚类中的分布情况,数据分布异常的对象即为离群点.将排序和聚类相结合来显著提高聚类的准确度.实验结果表明,RKBOutlier可以在双类型异质信息网络中有效地检测出离群点.

关键词: 离群点检测, 排序, 聚类, 目标对象, 属性对象

Abstract: Mining the outliers that are different from normal data objects in the network is one of the important tasks in data mining. At present, the research aiming at outlier detection in bi-typed heterogeneous information network is relatively small. The methods which are applicable to homogeneous network can not be applied to bi-typed heterogeneous networks. Therefore, we propose a Rank-Kmeans Based Outlier detection method, called RKBOutlier, in heterogeneous information network. The two kinds of the objects and the connected semantic information are extracted from the heterogeneous information network. One type of the objects is regarded as the attribute objects, another type of the objects is regarded as the target objects. We perform cluster partitioning on target objects to detect the distribution of the attribute objects in each cluster. The objects which are abnormal at data distribution are considered to be the outliers. Ranking and clustering are combined to significantly improve the accuracy of clustering. The experimental results show that RKBOutlier can effectively detect outliers in bi-typed heterogeneous information network.

Key words: outlier detection, ranking, clustering, target object, attribute object

中图分类号: