Twitter has become one of the largest microblogging platforms for users around the world to share anything happening around them with friends and beyond. A bursty topic in Twitter is one that triggers a surge of relevant tweets within a short period of time, which often reflects important events of mass interest. How to leverage Twitter for early detection of bursty topics has therefore become an important research problem with immense practical value. Despite the wealth of research work on topic modelling and analysis in Twitter, it remains a challenge to detect bursty topics in real-time. As existing methods can hardly scale to handle the task with the tweet stream in real-time, we propose in this paper TopicSketch, a sketch-based topic model together with a set of techniques to achieve real-time detection. We evaluate our solution on a tweet stream with over 30 million tweets. Our experiment results show both efficiency and effectiveness of our approach. Especially it is also demonstrated that TopicSketch on a single machine can potentially handle hundreds of millions tweets per day, which is on the same scale of the total number of daily tweets in Twitter, and present bursty events in finer-granularity. On the right hand side the workflow of TopicSketch is presented.
The demo below illustrates why acceleration can be used to filter out general topics, while at the same time preserve the bursty topics.
To measure the coherence of the detected bursty topics, we adopt the word intrusion task proposed in Reading Tea Leaves: How Humans Interpret Topic Models (Jonathan Chang et al.). In this task, the subject is presented with six randomly ordered words, in which five words are from a detected bursty topic and the other one is a intruding word. These words are also presented together with an interactive plot which shows the counts of each word over time, and the tweets contains these words, within a time window around the detection time. Below is a screenshot of the user interface.
Here some detected bursty topics are presented for exploration.