DOI: 10.3724/SP.J.1087.2013.03563

Journal of Computer Applications (计算机应用) 2013/33:12 PP.3563-3566

Identification method of spam comments in microblog based on AdaBoost

In view of the existence of a lot of spam comments in microblog, a new method based on AdaBoost was proposed to identify spam comments. This method firstly extracted feature vectors which consisted of eight feature values to represent the comments, then trained several weak classifiers which were better than random prediction on these features via AdaBoost algorithm, and finally combined these weighted weak classifiers to build a strong classifier with a high precision. The experimental results on comment data sets extracted from the popular Sina microblogs indicate that the selected eight features are effective for the method, and it has a high recognition rate in the identification of spam comments in microblog.

Key words:microblog,spam comments identification,feature vector,AdaBoost algorithm,weak classifier

ReleaseDate:2014-07-21 16:58:55