ZHANG Yang-sen, DUAN Yu-xiang, WANG Jian, et al. Microblog Bursty Events Detection Method Based on Multiple Word Features[J]. Acta Electronica Sinica, 2019, 47(9): 1919-1928.
DOI:
ZHANG Yang-sen, DUAN Yu-xiang, WANG Jian, et al. Microblog Bursty Events Detection Method Based on Multiple Word Features[J]. Acta Electronica Sinica, 2019, 47(9): 1919-1928. DOI: 10.3969/j.issn.0372-2112.2019.09.015.
Microblog Bursty Events Detection Method Based on Multiple Word Features
a wide variety of bursty events have been occurring frequently in many fields
impacting both the stability and the development of our society. This paper proposes an event detection model based on multiple word features
which is intended to detect bursty events in the massive microblog data. The model will assist decision-makers to monitor microblogs and guide public opinions and will minimize the negative effect of bursty events to society. Firstly
the model slices the microblog data according to the time information. In each time wi
ndow
the word frequency feature
the topic tag feature and the word frequency growth rate feature of each word are calculated separately. Then
the D-S evidence theory and the analytic hierarchy process are utilized to determine each word's feature weights
which are then merged to obtain the bursty feature value of the word. Words with large bursty feature value are selected to form the bursty feature word set and to construct a coupling degree matrix of bursty feature word set based on co-occurrence degree and tightness. Finally
the coupling degree matrix is used as the input of the hierarchical agglomerative clustering algorithm to generate a binary tree with bursty words being leaf nodes
and the internal similarity binary tree pruning algorithm is used to divide the clustering results.In this way
the detection of the corresponding time window's bursty events can be realized. The experimental results show that the event detection model based on bursty words has the best effect when the intra-cluster similarity threshold is 1.1
the correct rate is as high as 0.8462
the recall rate reaches 0.8684
and the
F
value is 0.8571
indicating the effectiveness of the proposed method.