Wirtschaftsinformatik 2023 Proceedings

IMPACT OF DATA COLLECTION ON ML MODELS: ANALYZING DIFFERENCES OF BIASES BETWEEN LOW- VS. HIGH-SKILLED ANNOTATORS

Johannes Schneider, University of Liechtenstein, GermanyFollow
Daniel Eisenhardt, Ruhr-Universität Bochum, GermanyFollow
Christian Utama, Freie Universität Berlin, GermanyFollow
Christian Meske, Ruhr-Universität Bochum, GermanyFollow

Abstract

Labeled data is crucial for the success of machine learning-based artificial intelligence. However, companies often face a choice between collecting few annotations from high- or low-skilled annotators, possibly exhibiting different biases. This study investigates differences in biases between datasets labeled by said annotator groups and their impact on machine learning models. Therefore, we created high- and low-skilled annotated datasets measured the contained biases through entropy and trained different machine learning models to examine bias inheritance effects. Our findings on text sentiment annotations show both groups exhibit a considerable amount of bias in their annotations, although there is a significant difference regarding the error types commonly encountered. Models trained on biased annotations produce significantly different predictions, indicating bias propagation and tend to make more extreme errors than humans. As partial mitigation, we propose and show the efficiency of a hybrid approach where data is labeled by low-skilled and high-skilled workers.

Paper Number

164

Comments

Track 5: Data Science & Business Analytics

Recommended Citation

Schneider, Johannes; Eisenhardt, Daniel; Utama, Christian; and Meske, Christian, "IMPACT OF DATA COLLECTION ON ML MODELS: ANALYZING DIFFERENCES OF BIASES BETWEEN LOW- VS. HIGH-SKILLED ANNOTATORS" (2023). Wirtschaftsinformatik 2023 Proceedings. 15.
https://aisel.aisnet.org/wi2023/15

Download

COinS

Wirtschaftsinformatik 2023 Proceedings

IMPACT OF DATA COLLECTION ON ML MODELS: ANALYZING DIFFERENCES OF BIASES BETWEEN LOW- VS. HIGH-SKILLED ANNOTATORS

Abstract

Paper Number

Comments

Recommended Citation

Search

Links

Browse

Author Corner

Wirtschaftsinformatik 2023 Proceedings

IMPACT OF DATA COLLECTION ON ML MODELS: ANALYZING DIFFERENCES OF BIASES BETWEEN LOW- VS. HIGH-SKILLED ANNOTATORS

Authors

Abstract

Paper Number

Comments

Recommended Citation

Share

Search

Links

Browse

Author Corner