Digitized Learning and Teaching

Does Content Matter? — an Empirical Investigation of ChatGPT-4's Ability to Score Essay Tasks in Exams

Philipp Hartmann, University of GöttingenFollow
Sanoa-Amina Bialas, University of GöttingenFollow
Sebastian Hobert, TH LübeckFollow
Matthias Schumann, University of GöttingenFollow

Paper Number

1724

Paper Type

Complete Research Paper

Abstract

Large language models such as ChatGPT-4 are said to have major potential in digital education. However, current research has mostly conducted empirical studies on the disadvantages, such as cheating in exams. Possible advantages, like essay scoring in exams, are typically only mentioned theoretically. In this study, 100 answers to each of two essay tasks in an exam at a German university were scored by human scorers and ChatGPT-4. Overall, it was shown that ChatGPT-4 awarded significantly more points than the human scorers. This effect was particularly strong for a complex task compared to a less complex task. Although in general the answer length for good answers is often higher than for bad answers, a high correlation between answer length and scoring could be demonstrated – even for wrong answers. For better comprehensibility, the results were further analyzed using a cluster analysis, whereby four clusters were identified.

Recommended Citation

Hartmann, Philipp; Bialas, Sanoa-Amina; Hobert, Sebastian; and Schumann, Matthias, "Does Content Matter? — an Empirical Investigation of ChatGPT-4's Ability to Score Essay Tasks in Exams" (2024). ECIS 2024 Proceedings. 3.
https://aisel.aisnet.org/ecis2024/track13_learning_teach/track13_learning_teach/3

Download

COinS

Jun 14th, 12:00 AM

Does Content Matter? — an Empirical Investigation of ChatGPT-4's Ability to Score Essay Tasks in Exams

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.

Digitized Learning and Teaching

Does Content Matter? — an Empirical Investigation of ChatGPT-4's Ability to Score Essay Tasks in Exams

Paper Number

Paper Type

Abstract

Recommended Citation

ECIS 2024

Search

Browse

Author Corner

Digitized Learning and Teaching

Does Content Matter? — an Empirical Investigation of ChatGPT-4's Ability to Score Essay Tasks in Exams

Presenter Information

Paper Number

Paper Type

Abstract

Recommended Citation

Share

ECIS 2024

Search

Browse

Author Corner