ANALYZING FILIPINO NEWS ARTICLES AND EDITORIALS THROUGH INFORMATION EXTRACTION AND SENTIMENT ANALYSIS
Abstract
How can we organize voluminous amount of news articles to facilitate better search options and analysis? We propose the use of natural language processing techniques, specifically information extraction and sentiment analysis, to allow easier data analysis on news articles and editorials. The proposed technique was tested on news documents written in Filipino. Grammar-based rules were formulated to extract pertinent information from the articles, and were automated through bootstrapping. The extracted information include the Filipino equivalent of the 5W user requirement proposed by Das et al. (2012) that answers the questions who, what, when, where, and why. Subsequently, the articles related through the 5Ws were analyzed based on their sentiment. Both information extraction and sentiment analysis were done at the article level. Collective results were presented visually. In designing the user interface, we considered (1) how the user would be able to find the articles he is looking for, (2) how he will immediately see the important points in the articles, as well as (3) the presenting the sentiment present in each articles and in the selected articles as a whole. To evaluate the performance of the information extraction and sentiment analysis, a gold standard was built to which the machine’s output was compared. The visualization system was also subjectively rated according to usability.