Abstract

Online reviews are an important asset for users deciding to buy a product, see a movie, or go to a restaurant, as well as for managers making business decisions. The reviews in the e-commerce websites are usually accompanied by ratings, facilitating users to learn the reviews. However, a lot of reviews spread across forums or social media are written in a plain text, which do not have ratings, called non-rated review in this paper. From the perspective of sentiment analysis, this study develops a predictive framework to calculate the ratings for non-rated reviews. The idea behind the framework begins at a couple of observations: (1) the rating of the review depends on sentiment score of aspects as well as the number of positive and negative aspects in the review; (2) the sentiment score of an aspect is determined by its context. Viewing term-pairs co-occurring with aspects as their context, we conceive of a variant of Conditional Random Field model, called SentiCRF, for generating term-pairs and calculating their sentiment scores from a train set. Then we develop a cumulative logit model that uses aspects and their sentiments in a review to predict ratings of the review. In addition, we meet a challenge of class imbalance on calculating sentiment scores of term-pairs. We also propose a heuristic re-sampling method to tackle class imbalance. Experiments conducted on the YELP dataset demonstrate the predictive framework is feasible and effective on predicting rating of reviews.

Share

COinS