Paper Type
Complete
Paper Number
1390
Description
The integration of Large Language Models (LLM) in Conversational Agents (CA) enables a significant advancement in the agents’ ability to understand and respond to user queries in a more human-like manner. Despite the widespread adoption of LLMs in these agents, there exists a noticeable lack of research on standardized evaluation methods. Addressing this research gap, our study proposes a comprehensive evaluation framework tailored explicitly to LLM-based conversational agents. In a Design Science Research (DSR) project, we construct an evaluation framework that incorporates four essential components: the pre-defined objectives of the agents, corresponding tasks, and the selection of appropriate datasets and metrics. Our framework outlines how these elements relate to each other in the evaluation and enables a structured approach for the evaluation. We demonstrate how such a framework enables a more systematic evaluation process. This framework can be a guiding tool for researchers and developers working with LLM-based conversational agents.
Recommended Citation
Wolters, Anna; Arz von Straussenburg, Arnold F.; and Riehle, Dennis M., "Evaluation Framework for Large Language Model-based Conversational Agents" (2024). PACIS 2024 Proceedings. 14.
https://aisel.aisnet.org/pacis2024/track01_aibussoc/track01_aibussoc/14
Evaluation Framework for Large Language Model-based Conversational Agents
The integration of Large Language Models (LLM) in Conversational Agents (CA) enables a significant advancement in the agents’ ability to understand and respond to user queries in a more human-like manner. Despite the widespread adoption of LLMs in these agents, there exists a noticeable lack of research on standardized evaluation methods. Addressing this research gap, our study proposes a comprehensive evaluation framework tailored explicitly to LLM-based conversational agents. In a Design Science Research (DSR) project, we construct an evaluation framework that incorporates four essential components: the pre-defined objectives of the agents, corresponding tasks, and the selection of appropriate datasets and metrics. Our framework outlines how these elements relate to each other in the evaluation and enables a structured approach for the evaluation. We demonstrate how such a framework enables a more systematic evaluation process. This framework can be a guiding tool for researchers and developers working with LLM-based conversational agents.
When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.
Comments
AI