1. 仲恺农业工程学院信息科学与技术学院,广东,广州,510225
2. 华南师范大学计算机 学院,广东,广州,510631
3. 中山大学数据科学与计算机学院,广东,广州,510006
4. 仲恺农业工程学院信息科学与技术学院,广东,广州,510225
5. 华南师范大学计算机 学院,广东,广州,510631
6. 中山大学数据科学与计算机学院,广东,广州,510006
网络出版:2019-05-25,
纸质出版:2019
移动端阅览
贺超波, 汤庸, 张琼, 等. 基于增量式鲁棒非负矩阵分解的短文本在线聚类[J]. 电子学报, 2019,47(5):1086-1093.
HE Chao-bo, TANG Yong, ZHANG Qiong, et al. Short Text Online Clustering Based on Incremental Robust Nonnegative Matrix Factorization[J]. Acta Electronica Sinica, 2019, 47(5): 1086-1093.
贺超波, 汤庸, 张琼, 等. 基于增量式鲁棒非负矩阵分解的短文本在线聚类[J]. 电子学报, 2019,47(5):1086-1093. DOI: 10.3969/j.issn.0372-2112.2019.05.016.
HE Chao-bo, TANG Yong, ZHANG Qiong, et al. Short Text Online Clustering Based on Incremental Robust Nonnegative Matrix Factorization[J]. Acta Electronica Sinica, 2019, 47(5): 1086-1093. DOI: 10.3969/j.issn.0372-2112.2019.05.016.
对社会化媒体产生的大量短文本进行聚类分析具有重要的应用价值,但短文本往往具有噪音数据多、增长迅速且数据量大的特点,导致现有相关算法难于有效处理.提出一种基于增量式鲁棒非负矩阵分解的短文本在线聚类算法STOCIRNMF.STOCIRNMF基于非负矩阵分解构建短文本聚类模型,通过
l
2,1
范数设计模型的优化求解目标函数提高鲁棒性,同时应用增量式迭代更新规则实现短文本的在线聚类.在搜狐新闻标题和微博短文本数据集上进行相关实验,结果表明STOCIRNMF不仅比现有代表性算法具有更好的聚类性能,而且能够有效对微博话题进行在线检测.
Clustering a large number of short texts in social media has great value in applications.However
short texts often have these characteristics:lots of noises
growing rapidly and massive data.Most existing short text clustering algorithms are not effectively enough to process such short texts.Aiming at this problem
we propose an algorithm of short text online clustering based on incremental robust nonnegative matrix factorization (STOCIRNMF).This algorithm uses NMF to build the short text clustering model and applies
l
2
1
norm to devise its objective function for improving its robustness.Meanwhile
STOCIRNMF can cluster short texts incrementally by using incremental iterative update rules.We co
nduct extensive experiments on real Sohu news titles and Weibo datasets.The results show that STOCIRNMF not only has better performance of short text clustering than some representative algorithms
but also is very effective to detect micro blog's topics online.
0
浏览量
281
下载量
2
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621