Orchestrating Multimodal AI Models to Analyze Supportive Health Communication in Short Videos

Abstract

Short-form video platforms are key venues for healthcare support, yet their multimodal form combining visual, auditory, and textual channels complicates scalable analysis. We present a framework grounded in Social Support Theory, implemented with a Masked Ordinal Expectation-Maximization algorithm integrating modality-specific annotations with expert guidance. A 2×2 design varying architecture (orchestrated vs. holistic) and prompt structure (focused vs. combined) shows trade-offs. The orchestrated framework paired with focused prompts yields highest accuracy in classifying support types, whereas the holistic approach with combined prompts better captures relational patterns. This trade-off appears only in high-capability models, as simpler models lack capacity to benefit from orchestration. These findings provide methodological guidance for multimodal analysis and practical insights for building systems that more effectively detect and recommend supportive health content online.

Recommended Citation

Wang, Xiangyu; Mai, Feng; Zhao, Kang; and Zhang, Bingbing, "Orchestrating Multimodal AI Models to Analyze Supportive Health Communication in Short Videos" (2025). Proceedings of the 2025 Pre-ICIS SIGDSA Symposium. 63.
https://aisel.aisnet.org/sigdsa2025/63

Proceedings of the 2025 Pre-ICIS SIGDSA Symposium

Orchestrating Multimodal AI Models to Analyze Supportive Health Communication in Short Videos

Abstract

Recommended Citation

Search

Links

Browse

Author Corner

Proceedings of the 2025 Pre-ICIS SIGDSA Symposium

Orchestrating Multimodal AI Models to Analyze Supportive Health Communication in Short Videos

Authors

Abstract

Recommended Citation

Share

Search

Links

Browse

Author Corner