Analytics and Data Science

Mechanisms for Automatic Training Data Labeling for Machine Learning

Paper Type

full

Description

One of the most pervasive challenges in adopting machine or deep learning is the scarcity of training data. This problem is amplified in IS research, where application domains usually require specialized knowledge. This study compares three systems to create a large dataset for training when only a small amount of human-labeled data is available: a high-precision LSTM classifier, a high-recall LSTM classifier, and manually created rule-based system. Based on fewer than 20,000 human-labeled training examples, we used automated labeling to add an additional 100,000 examples to the training data. We found that combining a small human-labeled dataset with a system-labeled dataset improves classification performance. In our evaluation, adding training data labeled by the high-recall LSTM to the human-labeled dataset achieved F1 of 0.578, and adding training data labeled by the rule-based system achieved F1 of 0.598, over 4% improvement compared to a baseline system that only uses human-labeled data.

Recommended Citation

Gu, Yang and Leroy, Gondy, "Mechanisms for Automatic Training Data Labeling for Machine Learning" (2019). ICIS 2019 Proceedings. 29.
https://aisel.aisnet.org/icis2019/data_science/data_science/29

Download

COinS

Mechanisms for Automatic Training Data Labeling for Machine Learning

Analytics and Data Science

Mechanisms for Automatic Training Data Labeling for Machine Learning

Paper Type

Description

Recommended Citation

Search

Browse

Author Corner

Links

ICIS 2019 Proceedings ISBN

Analytics and Data Science

Mechanisms for Automatic Training Data Labeling for Machine Learning

Presenter Information

Paper Type

Description

Recommended Citation

Share

Search

Browse

Author Corner

Links

ICIS 2019 Proceedings ISBN