Abstract

Current existing work on mining text data for business intelligence is not enough for many business applications, which need to analyze text data at fine grained level. The semantic annotation technology offers such potential functions, but current methods always suffer from a serious problem: requiring large mount of semantically annotated training examples, which prohibits semantic annotation technology from being used in practice. In this paper, an active learning method is designed for reducing the amount of training examples. Though analyzing the version space of the large margin method, several query functions are proposed to quickly reduce the version space and learn the optimal classifier using much fewer labeled training examples. The two empirical evaluations show it can reduce the amount of labeled training data even by 50%. This study not only contributes to the theory of active learning research, but also brings many important implications for real business applications.

Share

COinS