1. 北京信息科技大学智能信息处理研究所,北京,100101
2. 北京大学计算语言学 研究所,北京,100871
3. 国家经济安全预警工程北京实验室,北京,100044
4. 北京信息科技大学智能信息处理研究所,北京,100101
5. 北京大学计算语言学 研究所,北京,100871
6. 国家经济安全预警工程北京实验室,北京,100044
网络出版:2019-09-25,
纸质出版:2019
移动端阅览
张仰森, 段宇翔, 王建, 等. 基于多种词特征的微博突发事件检测方法[J]. 电子学报, 2019,47(9):1919-1928.
ZHANG Yang-sen, DUAN Yu-xiang, WANG Jian, et al. Microblog Bursty Events Detection Method Based on Multiple Word Features[J]. Acta Electronica Sinica, 2019, 47(9): 1919-1928.
张仰森, 段宇翔, 王建, 等. 基于多种词特征的微博突发事件检测方法[J]. 电子学报, 2019,47(9):1919-1928. DOI: 10.3969/j.issn.0372-2112.2019.09.015.
ZHANG Yang-sen, DUAN Yu-xiang, WANG Jian, et al. Microblog Bursty Events Detection Method Based on Multiple Word Features[J]. Acta Electronica Sinica, 2019, 47(9): 1919-1928. DOI: 10.3969/j.issn.0372-2112.2019.09.015.
近年来,各领域内频频发生各类突发事件,对社会稳定发展产生了一定程度的影响.本文提出了一种基于多种词特征的微博突发事件检测模型,可以在海量微博数据中对突发事件进行检测,便于相关决策者进行微博监控和舆论引导,尽可能减少突发事件给社会带来的危害.首先根据时间信息对微博数据进行时间切片,对每一个时间窗口内的数据分别计算各个词语的词频特征、话题标签特征和词频增长率特征;然后基于D-S证据理论和层次分析法,确定词的各个特征权重,并进行加权融合得到词的突发特征值,将突发特征值大的词挑选出来构成突发特征词集,构建基于共现度和结合紧密度的突发事件特征词集的耦合度矩阵;最后将该耦合度矩阵作为凝聚式层次聚类算法的输入,生成一棵由突发词为叶子节点的二叉树,并采用内部相似度的二叉树剪枝算法对聚类结果进行划分,即可实现对相应时间窗口突发事件的检测.实验结果表明,基于突发词的事件检测模型在簇内部相似度阈值等于1.1时效果最好,正确率达到0.8462、召回率达到0.8684、
F
值为0.8571,表明了本文所提方法的有效性.
In recent years
a wide variety of bursty events have been occurring frequently in many fields
impacting both the stability and the development of our society. This paper proposes an event detection model based on multiple word features
which is intended to detect bursty events in the massive microblog data. The model will assist decision-makers to monitor microblogs and guide public opinions and will minimize the negative effect of bursty events to society. Firstly
the model slices the microblog data according to the time information. In each time wi
ndow
the word frequency feature
the topic tag feature and the word frequency growth rate feature of each word are calculated separately. Then
the D-S evidence theory and the analytic hierarchy process are utilized to determine each word's feature weights
which are then merged to obtain the bursty feature value of the word. Words with large bursty feature value are selected to form the bursty feature word set and to construct a coupling degree matrix of bursty feature word set based on co-occurrence degree and tightness. Finally
the coupling degree matrix is used as the input of the hierarchical agglomerative clustering algorithm to generate a binary tree with bursty words being leaf nodes
and the internal similarity binary tree pruning algorithm is used to divide the clustering results.In this way
the detection of the corresponding time window's bursty events can be realized. The experimental results show that the event detection model based on bursty words has the best effect when the intra-cluster similarity threshold is 1.1
the correct rate is as high as 0.8462
the recall rate reaches 0.8684
and the
F
value is 0.8571
indicating the effectiveness of the proposed method.
0
浏览量
188
下载量
3
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621