Abstract

In the era of Social Web, there has been an explosive growth of user-contributed comments posted to various online social media. However, increasingly more misleading and deceptive user comments found at online social media have also been a great concern for consumers and merchants, and social spam have been brought to the attention by the legal circle in recent years. Social spam can cause tremendous loss to both consumers and merchants, and so there is a pressing need to design effective methodologies to detect social spam to maintain the hygiene of online social media. The main contribution of this paper is the illustration of a novel social spam detection methodology which combines word-, topic-, and user-based features to combat social spam. In particular, the proposed methodology is underpinned by the Labeled Latent Dirichlet Allocation (L-LDA) model, a kind of probabilistic generative model. A series of experiments conducted based on the social comments posted to YouTube show that our proposed methodology can achieve a detection accuracy of 91.17%. The business implication of our research is that merchants can apply our methodology to filter spam so as to extract accurate market intelligence from online social media. Moreover, social media site owners can leverage the proposed methodology to maintain the hygiene of their sites.

Share

COinS