Start Date
14-12-2012 12:00 AM
Description
Sentiment classification is one of the most extensively studied problems in sentiment analysis and supervised learning methods, which require labeled data for training, have been proven quite effective. However, supervised methods assume the training domain and the testing domain share the same distribution; otherwise, accuracy drops dramatically. Although this does not pose problems when training data are readily available, in some circumstances, labeled data is quite expensive to acquire. For instance, if we want to detect sentiment from Tweets or Facebook comments, the only way to acquire is to manually label it and thus prohibitively burdensome and time-consuming. In this paper, we propose a hybrid approach that integrates the information from multiple source domains labeled data and a set of preselected sentiment words to solve this problem. The experimental results suggest that our method statistically outperforms the state of the art and even surpasses the in-domain method in some cases.
Recommended Citation
Fang, Fang; Datta, Anindya; and Dutta, Kaushik, "A Hybrid Method for Cross-domain Sentiment Classification Using Multiple Sources" (2012). ICIS 2012 Proceedings. 2.
https://aisel.aisnet.org/icis2012/proceedings/KnowledgeManagement/2
A Hybrid Method for Cross-domain Sentiment Classification Using Multiple Sources
Sentiment classification is one of the most extensively studied problems in sentiment analysis and supervised learning methods, which require labeled data for training, have been proven quite effective. However, supervised methods assume the training domain and the testing domain share the same distribution; otherwise, accuracy drops dramatically. Although this does not pose problems when training data are readily available, in some circumstances, labeled data is quite expensive to acquire. For instance, if we want to detect sentiment from Tweets or Facebook comments, the only way to acquire is to manually label it and thus prohibitively burdensome and time-consuming. In this paper, we propose a hybrid approach that integrates the information from multiple source domains labeled data and a set of preselected sentiment words to solve this problem. The experimental results suggest that our method statistically outperforms the state of the art and even surpasses the in-domain method in some cases.