Paper Number

ICIS2025-2063

Paper Type

Short

Abstract

Large Behavioral Models (LBMs) offer promise for developing socially intelligent robots capable of adaptive, multimodal interaction. Yet their progress is constrained by fragmented pipelines and insufficiently grounded inputs. We argue that LBMs must be trained through the same communicative channels humans use—verbal and non-verbal (gaze, gesture, posture, affect, timing). Using a clinical problematization approach, we examine upstream pipelines that distill raw signals into structured, socially meaningful cues. Foundational models—particularly LLMs and VLMs—can transduce speech, text, images, and video into compact social state variables, but their limitations remain underexplored. Across five diagnostic probes, we find limitations in temporal coherence and cross-modal fusion. By surfacing constraints, this study establishes diagnostic groundwork for architectural innovation and advances an agenda to bridge the gap between current model capacity and the demands of socially intelligent robotics.

Comments

13-DesignDevPM

Share

COinS
 
Dec 14th, 12:00 AM

Training Socially Intelligent Robots with Large Behavioral Models: Challenges, Strategies, and Future Research Opportunities

Large Behavioral Models (LBMs) offer promise for developing socially intelligent robots capable of adaptive, multimodal interaction. Yet their progress is constrained by fragmented pipelines and insufficiently grounded inputs. We argue that LBMs must be trained through the same communicative channels humans use—verbal and non-verbal (gaze, gesture, posture, affect, timing). Using a clinical problematization approach, we examine upstream pipelines that distill raw signals into structured, socially meaningful cues. Foundational models—particularly LLMs and VLMs—can transduce speech, text, images, and video into compact social state variables, but their limitations remain underexplored. Across five diagnostic probes, we find limitations in temporal coherence and cross-modal fusion. By surfacing constraints, this study establishes diagnostic groundwork for architectural innovation and advances an agenda to bridge the gap between current model capacity and the demands of socially intelligent robotics.

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.