Abstract
Short-form video platforms are key venues for healthcare support, yet their multimodal form combining visual, auditory, and textual channels complicates scalable analysis. We present a framework grounded in Social Support Theory, implemented with a Masked Ordinal Expectation-Maximization algorithm integrating modality-specific annotations with expert guidance. A 2×2 design varying architecture (orchestrated vs. holistic) and prompt structure (focused vs. combined) shows trade-offs. The orchestrated framework paired with focused prompts yields highest accuracy in classifying support types, whereas the holistic approach with combined prompts better captures relational patterns. This trade-off appears only in high-capability models, as simpler models lack capacity to benefit from orchestration. These findings provide methodological guidance for multimodal analysis and practical insights for building systems that more effectively detect and recommend supportive health content online.
Recommended Citation
Wang, Xiangyu and Mai, Feng, "Orchestrating Multimodal AI Models to Analyze Supportive Health Communication in Short Videos" (2025). Proceedings of the 2025 Pre-ICIS SIGDSA Symposium. 21.
https://aisel.aisnet.org/sigdsa2025/21