Abstract

Today, a number of automated methods exist to augment predictive models with annotations when working on huge collections of unstructured texts. One such method involves the use of machine learning techniques. This work seeks to investigate the use of those techniques for annotations on UIMA - a framework used for the analysis of unstructured data. By using the design science research methodology, an annotator pipeline is created as the main artifact. It takes news and blog articles as input and extracts entities, which are then annotated with a sentiment. Concurrently a demonstration is taking place, using a set of 12,480 gold annotated documents on German car manufacturers as training data. By harnessing a multitude of validations methods and machine learning algorithms, all results are thoroughly tested and evaluated. As a result, this work provides a blueprint for research into the use of UIMA and machine learning techniques for domain constrained datasets.

Share

COinS
 

Machine Learning Techniques for Annotations of Large Financial Text Datasets

Today, a number of automated methods exist to augment predictive models with annotations when working on huge collections of unstructured texts. One such method involves the use of machine learning techniques. This work seeks to investigate the use of those techniques for annotations on UIMA - a framework used for the analysis of unstructured data. By using the design science research methodology, an annotator pipeline is created as the main artifact. It takes news and blog articles as input and extracts entities, which are then annotated with a sentiment. Concurrently a demonstration is taking place, using a set of 12,480 gold annotated documents on German car manufacturers as training data. By harnessing a multitude of validations methods and machine learning algorithms, all results are thoroughly tested and evaluated. As a result, this work provides a blueprint for research into the use of UIMA and machine learning techniques for domain constrained datasets.