A Self-Adaptive Microblog Topic Tracking Method by User Relationship
BAI Wen-yan1,2, ZHANG Chuang1, XU Ke-fu1, ZHANG Zhi-ming3
1. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;
2. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;
3. Information Technologies Co, (Beijing)Ltd, Beijing 100089, China
Considering the colloquial,short text and other characteristics of microblog and deficiencies in research of it,this article proposes a self-adaptive topic tracking method of microblog by user relationship.First of all,during the tracking time window,the candidate tweet set is mapped into feature space.Secondly,aiming at the characteristic of tweet distribution and the purpose of topic tracking,the paper converts the tweets' feature space.Based on this operation,a binary clustering on tweets set can be constructed by improved K-means clustering algorithm.The yielded relative collection is the target model of the current topic.The experiments with the data extracted from Twitter,show that this method can track down the trend of hot topics and the evolution of focuses in real time,and improve the stability of topic tracking in microblog.This method serves well for user recommendation and public opinion analysis.
[1] Lin J,Kolcz A.Large-scale machine learning at Twitter[A].Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data[C].USA:ACM,2012.793-804.
[2] Petrovi S,Osborne M,Lavrenko V.Streaming first story detection with application to Twitter[A].Human Language Technologies:The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics[C].USA:ACL,2010.181-189.
[3] Phuvipadawat S,Murata T.Breaking news detection and tracking in Twitter[A].IEEE/WIC/ACM Web Intelligence and Intelligent Agent Technology[C].USA:ACM,2010.120-123.
[4] Duan Y,Wei F,Zhou M,et al.Graph-based collective classification for tweets[A].Proceedings of the 21st ACM International Conference on Information and Knowledge Management[C].USA:ACM,2012.2323-2326.
[5] Nishida K,Hoshide T,Fujimura K.Improving tweet stream classification by detecting changes in word probability[A].Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval[C].USA:ACM,2012.971-980.
[6] Albakour M,Macdonald C,Ounis I.On sparsity and drift for effective real-time filtering in microblogs[A].Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management[C].USA:ACM,2013.419-428.
[7] 崔争艳.基于语义的微博短信息分类[J].现代计算机,2010,(8):18-20. CUI Zheng-yan.Short message classification of microblogging based on semantic[J].Modern Computer,2010,(8):18-20.(in Chinese)
[8] 路荣,项亮,刘明荣,杨青.基于隐主题分析和文本聚类的微博客中新闻话题的发现[J].模式识别与人工智能,2012,25(3):382-387. Lu R,Xiang L,Liu M R,Liu Q.Discovering news topics from microblogs based on hidden topics analysis and text clustering[J].Pattern Recognition & Artificial Intelligence,2012,25(3):382-387.(in Chinese)
[9] Tang J,Wang X,Gao H,et al.Enriching short text representation in microblog for clustering[J].Frontiers of Computer Science in China,2012,6(1):88-101.
[10] 孙胜平.中文微博客热点话题检测与跟踪技术研究[D].北京:北京交通大学,2011.
[11] 洪宇,张宇,范基礼,刘挺,李生.基于语义域语言模型的中文话题关联检测[J].软件学报,2008,19(9):2265-2275. Hong Y,Zhang Y,Fan JL,Liu T,Li S.Chinese topic link detection based on semantic domain language model[J].Journal of Software,2008,19(9):2265-2275.(in Chinese)
[12] 郝建波.微博突发话题检测、跟踪与传播预测技术研究[D].哈尔滨:哈尔滨工程大学,2013.
[13] 刘彦伟.微博话题追踪系统的研究与实现[D].北京:北京交通大学,2013.
[14] 王慧.微博话题追踪方法研究与设计[D].北京:北京交通大学,2014.
[15] 史存会.微博客话题追踪及实时检索的相关研究[D].大连:大连理工大学,2011.