Journal of Computer Applications (计算机应用) 2013/33:12 PP.3559-3562
For the purpose of events extraction from large-scale short posts of microblogging service, a complete event detection and tracking algorithm was proposed using cloud framework. First, based on the number of forward and comment of the microblog, the posts were expressed as Vector Space Model (VSM). Then the keywords were extracted using RIHDBSCAN (Incremental Hierarchical DBSCAN based on Representative posts) to realize the event detection and tracking. Considering that a single node cannot quickly and efficiently handle the large amount of data, the algorithm would be deployed on Hadoop, a cloud computing platform. The experiment on real microblog data extracted from Sina microblogging platform shows that the proposed method achieves higher performance than that of TF-IDF (Term Frequency-Inverse Document Frequency) and UF-ITUF (User Frequency-Inverse Thread User Frequency), and the use of cloud framework improves the processing speed. Therefore, it is suitable for data analysis and mining on huge datasets.