Abstract

Standardized exams have been established as benchmarks for model performance. Whilst this implies the target of increasing the benchmarking scores, the task of guaranteeing integrity of said exams is also arising. With the rise of smart glasses like Ray-Ban Meta employing large language models to cheat appears more trivial and promising than ever before, even in in-person exams. This article evaluates the integration of these models in this real-life scenario, providing overall results for the task, thus showing the threat in the given scenario and opening a further debate on addressing AI in educational contexts besides the obvious benefits and known issues e.g. regarding written assignments. To achieve this, the performance of state-of-the-art LLMs is evaluated on the task of answering standardized exam questions using a new image set which emulates the use of smart glasses. Based on this preliminary study, a prototype is developed which is evaluated further.

Share

COinS