Classifying Investor Sentiment in Microblogs: A Transfer Learning Approach

Shuyuan Deng, Grand Valley State University
Dong-Heon Kwak, Kent State University
Jiao Wu, Central Michigan University
Atish Sinha, University of Wisconsin-Milwaukee
Huimin Zhao, University of Wisconsin-Milwaukee

Description

Microblogs have become a rich pool for public opinion in recent years. This has motivated both researchers and practitioners to mine public opinion for predicting stock returns (Li et al. 2018). To extract user opinion from microblog messages, prior studies have commonly used sentiment analysis. Sentiment analysis (or opinion mining) is a class of computational approaches that assesses the directional state of emotions (e.g., positive and negative) in texts (Abbasi et al. 2011). The sentiment analysis techniques used by prior research fall into two categories (Deng et al. 2017). The first category is lexicon-based, where a sentiment lexicon containing a predefined list of sentiment words is used to identify the sentiment strength of a document based on word counts. The other category is supervised learning in which a sentiment classifier is trained using data with known sentiment. Given a large and high-quality training data set, supervised learning typically performs better. However, such a training data set is often unavailable. Training data for sentiment analysis typically needs to be manually annotated, making the cost extremely high for creating a large data set. Most prior studies extracting investor sentiment from microblogs have used thousands of messages as training data. Such training data sets are quite small, given the millions of messages that need to be analyzed. How to take advantage of the vast amount of microblog messages to improve sentiment classification remains a challenging question. \ This study addresses the aforementioned question using a transfer learning approach (Pan and Yang 2010). We formulate a sentiment classification task that classifies the sentiment of StockTwits messages as positive, neutral, or negative. StockTwits has a large amount of message with users’ self-reported sentiment (i.e., bullish and bearish). To utilize this large two-class data set for our three-class sentiment classification task, we propose a transfer learning approach. First, we use these labeled messages to train a deep neural network (DNN) that predicts binary sentiment classes. The weights of the first few layers in the neural network are retained. Next, for the three-class sentiment classification task, we create a small manually annotated data set for cross validation. For this task, we use another DNN that has the same first few layers as the first DNN, but a different output layer. Knowledge learned from the larger two-class data set is expected to be effectively transferred to the three-class sentiment classification task. \

 
Aug 16th, 12:00 AM

Classifying Investor Sentiment in Microblogs: A Transfer Learning Approach

Microblogs have become a rich pool for public opinion in recent years. This has motivated both researchers and practitioners to mine public opinion for predicting stock returns (Li et al. 2018). To extract user opinion from microblog messages, prior studies have commonly used sentiment analysis. Sentiment analysis (or opinion mining) is a class of computational approaches that assesses the directional state of emotions (e.g., positive and negative) in texts (Abbasi et al. 2011). The sentiment analysis techniques used by prior research fall into two categories (Deng et al. 2017). The first category is lexicon-based, where a sentiment lexicon containing a predefined list of sentiment words is used to identify the sentiment strength of a document based on word counts. The other category is supervised learning in which a sentiment classifier is trained using data with known sentiment. Given a large and high-quality training data set, supervised learning typically performs better. However, such a training data set is often unavailable. Training data for sentiment analysis typically needs to be manually annotated, making the cost extremely high for creating a large data set. Most prior studies extracting investor sentiment from microblogs have used thousands of messages as training data. Such training data sets are quite small, given the millions of messages that need to be analyzed. How to take advantage of the vast amount of microblog messages to improve sentiment classification remains a challenging question. \ This study addresses the aforementioned question using a transfer learning approach (Pan and Yang 2010). We formulate a sentiment classification task that classifies the sentiment of StockTwits messages as positive, neutral, or negative. StockTwits has a large amount of message with users’ self-reported sentiment (i.e., bullish and bearish). To utilize this large two-class data set for our three-class sentiment classification task, we propose a transfer learning approach. First, we use these labeled messages to train a deep neural network (DNN) that predicts binary sentiment classes. The weights of the first few layers in the neural network are retained. Next, for the three-class sentiment classification task, we create a small manually annotated data set for cross validation. For this task, we use another DNN that has the same first few layers as the first DNN, but a different output layer. Knowledge learned from the larger two-class data set is expected to be effectively transferred to the three-class sentiment classification task. \