Start Date

14-12-2012 12:00 AM

Description

Sentiment classification is one of the most extensively studied problems in sentiment analysis and supervised learning methods, which require labeled data for training, have been proven quite effective. However, supervised methods assume the training domain and the testing domain share the same distribution; otherwise, accuracy drops dramatically. Although this does not pose problems when training data are readily available, in some circumstances, labeled data is quite expensive to acquire. For instance, if we want to detect sentiment from Tweets or Facebook comments, the only way to acquire is to manually label it and thus prohibitively burdensome and time-consuming. In this paper, we propose a hybrid approach that integrates the information from multiple source domains labeled data and a set of preselected sentiment words to solve this problem. The experimental results suggest that our method statistically outperforms the state of the art and even surpasses the in-domain method in some cases.

Share

COinS
 
Dec 14th, 12:00 AM

A Hybrid Method for Cross-domain Sentiment Classification Using Multiple Sources

Sentiment classification is one of the most extensively studied problems in sentiment analysis and supervised learning methods, which require labeled data for training, have been proven quite effective. However, supervised methods assume the training domain and the testing domain share the same distribution; otherwise, accuracy drops dramatically. Although this does not pose problems when training data are readily available, in some circumstances, labeled data is quite expensive to acquire. For instance, if we want to detect sentiment from Tweets or Facebook comments, the only way to acquire is to manually label it and thus prohibitively burdensome and time-consuming. In this paper, we propose a hybrid approach that integrates the information from multiple source domains labeled data and a set of preselected sentiment words to solve this problem. The experimental results suggest that our method statistically outperforms the state of the art and even surpasses the in-domain method in some cases.