Start Date

12-16-2013

Description

Sentiment analysis is widely adopted in studying various important topics in business intelligence. Though many studies reported interesting results by using machine learning, the lack of theoretic analysis and the shortage of practical guidance are hurdles of theory development. Besides, due to the difficulty in labeling data, the effectiveness of sentiment analysis with only labelled data needs to be questioned. In this paper, we drew on statistical learning theory to perform extensive theoretic analysis in sentiment analysis by using real corporate financial reports. We investigated when and why machine learning methods provide preferred performance under the guidance of the theory. We also provided practical suggestions in applying machine learning methods for both researchers and practitioners. In addition, we utilized the cheap and ubiquitous unlabeled data to further improve the sentiment analysis performance. This has the potential to largely reduce the manual data labeling work and to scale up the experiments.

Share

COinS
 
Dec 16th, 12:00 AM

Effective Sentiment Analysis of Corporate Financial Reports

Sentiment analysis is widely adopted in studying various important topics in business intelligence. Though many studies reported interesting results by using machine learning, the lack of theoretic analysis and the shortage of practical guidance are hurdles of theory development. Besides, due to the difficulty in labeling data, the effectiveness of sentiment analysis with only labelled data needs to be questioned. In this paper, we drew on statistical learning theory to perform extensive theoretic analysis in sentiment analysis by using real corporate financial reports. We investigated when and why machine learning methods provide preferred performance under the guidance of the theory. We also provided practical suggestions in applying machine learning methods for both researchers and practitioners. In addition, we utilized the cheap and ubiquitous unlabeled data to further improve the sentiment analysis performance. This has the potential to largely reduce the manual data labeling work and to scale up the experiments.