Paper Type

Complete

Abstract

This study investigates how verbal and visual instructional formats influence human-AI collaboration in augmented reality (AR) environments designed for technical diagnosis. Drawing on dual-coding theory, the authors distinguish between inductive and deductive instructions and schematic and structural illustrations, proposing that cognitive alignment between these modalities, termed processing consistency, enhances performance. Two experiments revealed that inductive instructions improve diagnostic efficiency with schematic visuals but reduce it with structural visuals. Conversely, deductive instructions paired with structural visuals enhance the effectiveness. The study also examined how cognitive effort and confused emotions mediate these effects, finding that humanoid verbalizations engage both mechanisms, while robotic voices primarily evoke emotional confusion. These findings offer theoretical insights into multimodal information processing and practical guidance for designing effective AR-based AI systems for collaborative tasks, particularly in contexts requiring critical decision-making and spatial reasoning.

Paper Number

1378

Author Connect URL

https://authorconnect.aisnet.org/conferences/AMCIS2025/papers/1378

Comments

SIGHCI

Author Connect Link

Share

COinS
 
Aug 15th, 12:00 AM

Empowering Human-AI Collaboration with AR: The Importance of Visual Illustration and Verbalized Instruction

This study investigates how verbal and visual instructional formats influence human-AI collaboration in augmented reality (AR) environments designed for technical diagnosis. Drawing on dual-coding theory, the authors distinguish between inductive and deductive instructions and schematic and structural illustrations, proposing that cognitive alignment between these modalities, termed processing consistency, enhances performance. Two experiments revealed that inductive instructions improve diagnostic efficiency with schematic visuals but reduce it with structural visuals. Conversely, deductive instructions paired with structural visuals enhance the effectiveness. The study also examined how cognitive effort and confused emotions mediate these effects, finding that humanoid verbalizations engage both mechanisms, while robotic voices primarily evoke emotional confusion. These findings offer theoretical insights into multimodal information processing and practical guidance for designing effective AR-based AI systems for collaborative tasks, particularly in contexts requiring critical decision-making and spatial reasoning.

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.