Paper Number

ECIS2026-1879

Paper Type

CRP

Abstract

Predictive multimodal AI integrates diverse data types, including text, images, and audio, to produce a single predictive output. Although multimodality improves performance, reasoning across modalities simultaneously increases model opacity, introducing challenges for explainable AI (XAI) beyond those of unimodal AI. Technically, multimodal explanations must capture how fused heterogeneous data streams influence predictions. Cognitively, multimodality requires selecting which modalities, relationships, and granularity to convey to support human understanding. Yet existing research lacks a framework for how the informational content of multimodal explanations shapes human reasoning. Addressing this gap, we conduct a conceptual review of predictive multimodal XAI as both a technical and a human reasoning challenge. We identify four clusters of multimodal XAI approaches, distinguished by what cross-modal information they convey and in what form and granularity. Drawing on cognitive psychology, we conjecture how each cluster affects human causal connection and explanation selection, guiding future human-centric multimodal XAI research and design.

Share

COinS
 
Jun 14th, 12:00 AM

Explaining Multimodal AI Predictions: A Conceptual Review

Predictive multimodal AI integrates diverse data types, including text, images, and audio, to produce a single predictive output. Although multimodality improves performance, reasoning across modalities simultaneously increases model opacity, introducing challenges for explainable AI (XAI) beyond those of unimodal AI. Technically, multimodal explanations must capture how fused heterogeneous data streams influence predictions. Cognitively, multimodality requires selecting which modalities, relationships, and granularity to convey to support human understanding. Yet existing research lacks a framework for how the informational content of multimodal explanations shapes human reasoning. Addressing this gap, we conduct a conceptual review of predictive multimodal XAI as both a technical and a human reasoning challenge. We identify four clusters of multimodal XAI approaches, distinguished by what cross-modal information they convey and in what form and granularity. Drawing on cognitive psychology, we conjecture how each cluster affects human causal connection and explanation selection, guiding future human-centric multimodal XAI research and design.

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.