Paper Number
1452
Paper Type
Short
Description
The expectation of human-like capabilities in communication for conversational agents (also known as “chatbots”) dates back to the proposal of the “Turing Test” (1950s) and this expectation has increased in recent years due to the technological breakthroughs in large language models such as ChatGPT. However, the current evaluation of conversational agents lacks a theoretically top-down framework as most evaluation instruments are created by computer science researchers in a ground-up manner. Thus, these evaluation dimensions may work well in certain contexts but fail to generalize to human-like conversations. In this paper, we design a novel theory-driven evaluation survey instrument for conversational agents based on the results from our mapping mechanism between theoretical measures (TMs) in linguistics and existing empirically developed dimensions (EDs). We also further identify the most representative EDs for each TM through the theory-constrained clustering in an empirical study.
Recommended Citation
Zhang, Lining; Sedoc, João; and Levina, Natalia, "Back to Principles: Theory-driven Evaluation of AI-based Conversational Agents" (2024). ICIS 2024 Proceedings. 4.
https://aisel.aisnet.org/icis2024/adv_theory/adv_theory/4
Back to Principles: Theory-driven Evaluation of AI-based Conversational Agents
The expectation of human-like capabilities in communication for conversational agents (also known as “chatbots”) dates back to the proposal of the “Turing Test” (1950s) and this expectation has increased in recent years due to the technological breakthroughs in large language models such as ChatGPT. However, the current evaluation of conversational agents lacks a theoretically top-down framework as most evaluation instruments are created by computer science researchers in a ground-up manner. Thus, these evaluation dimensions may work well in certain contexts but fail to generalize to human-like conversations. In this paper, we design a novel theory-driven evaluation survey instrument for conversational agents based on the results from our mapping mechanism between theoretical measures (TMs) in linguistics and existing empirically developed dimensions (EDs). We also further identify the most representative EDs for each TM through the theory-constrained clustering in an empirical study.
Comments
20-Theory