Certamen Artificialis Intelligentia: Evaluating AI in Solving AI-generated Programming Exercises

Carmine Coppola, University of Naples “Parthenope”, ItalyFollow
Simone Perrotta, University of Naples “Parthenope”, ItalyFollow
Ciro Giuseppe De Vita, University of Naples “Parthenope”, ItalyFollow
Gennaro Mellone, University of Naples “Parthenope”, ItalyFollow
Diana Di Luccio, University of Naples “Parthenope”, ItalyFollow
Raffaele Montella, University of Naples “Parthenope”, ItalyFollow
José Carlos Paiva, Polytechnic of Porto, PortugalFollow
Ricardo Queirós, Polytechnic of Porto, PortugalFollow
Robertas Damasevicius, Kaunas University of Technology, LithuaniaFollow
Rytis Maskeliuna, Kaunas University of Technology, LithuaniaFollow
Jakub Swacha, University of Szczecin, PolandFollow

Abstract

Large language models (LLMs) are transforming programming education by enabling automated generation and evaluation of coding exercises. While previous studies have evaluated LLMs’ capabilities in one of these tasks, none have explored their effectiveness in solving programming exercises generated by other LLMs. This paper fills that gap by examining how state-of-the-art LLMs—ChatGPT, DeepSeek, Qwen, and Gemini—perform when solving exercises generated by different LLMs. Our study introduces a novel evaluation methodology featuring a structured prompt engineering strategy for generating and executing programming exercises in three widely used programming languages: Python, Java, and JavaScript. The results have both practical and theoretical value. Practically, they help identify which models are more effective at generating and solving exercises produced by LLMs. Theoretically, the study contributes to understanding the role of LLMs as collaborators in creating educational programming content.

Recommended Citation

Coppola, C., Perrotta, S., Giuseppe De Vita, C., Mellone, G., Di Luccio, D., Montella, R., Paiva, J.C., Queirós, R., Damasevicius, R., Maskeliuna, R. & Swacha, J. (2025). Certamen Artificialis Intelligentia: Evaluating AI in Solving AI-generated Programming ExercisesIn I. Luković, S. Bjeladinović, B. Delibašić, D. Barać, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Empowering the Interdisciplinary Role of ISD in Addressing Contemporary Issues in Digital Transformation: How Data Science and Generative AI Contributes to ISD (ISD2025 Proceedings). Belgrade, Serbia: University of Gdańsk, Department of Business Informatics & University of Belgrade, Faculty of Organizational Sciences. ISBN: 978-83-972632-1-5. https://doi.org/10.62036/ISD.2025.123

Paper Type

Poster

DOI

10.62036/ISD.2025.123

Track 6: Learning, Education, and Training

Certamen Artificialis Intelligentia: Evaluating AI in Solving AI-generated Programming Exercises

Abstract

Recommended Citation

Paper Type

DOI

Search

Browse

Author Corner

Links

Track 6: Learning, Education, and Training

Certamen Artificialis Intelligentia: Evaluating AI in Solving AI-generated Programming Exercises

Presenter Information

Abstract

Recommended Citation

Paper Type

DOI

Share

Search

Browse

Author Corner

Links