Paper Number
ICIS2025-2634
Paper Type
Short
Abstract
Game live-streaming has become a popular entertainment form. Its real-time interaction boosts viewer engagement and sponsorship behavior. However, previous research has rarely explored how interactions gradually develop, boost through emotional resonance, and lead to specific outcomes over time. This study introduces a multimodal analytical framework that combines signals from live streaming and uses a multimodal large language model with Chain-of-Thought (CoT) reasoning to build a step-by-step inference process. This process uncovers the emotional and group dynamics behind interactions. The framework includes four tasks: detecting shared moods among viewers, identifying collective effervescence, evaluating the resulting interaction outcomes, and estimating their likelihood of continuing in future streams. Together, these elements offer a more transparent and well-structured overview of live-streaming interactions, highlighting the practical applications of multimodal large language models in analyzing intricate, cross-modal interaction patterns.
Recommended Citation
Yeh, Chao-Chuan and Hsiang, Chien-Yi, "Ritual in Live Streaming: Reasoning Interaction through Multimodal Large Language Model" (2025). ICIS 2025 Proceedings. 28.
https://aisel.aisnet.org/icis2025/sharing_econ/sharing_econ/28
Ritual in Live Streaming: Reasoning Interaction through Multimodal Large Language Model
Game live-streaming has become a popular entertainment form. Its real-time interaction boosts viewer engagement and sponsorship behavior. However, previous research has rarely explored how interactions gradually develop, boost through emotional resonance, and lead to specific outcomes over time. This study introduces a multimodal analytical framework that combines signals from live streaming and uses a multimodal large language model with Chain-of-Thought (CoT) reasoning to build a step-by-step inference process. This process uncovers the emotional and group dynamics behind interactions. The framework includes four tasks: detecting shared moods among viewers, identifying collective effervescence, evaluating the resulting interaction outcomes, and estimating their likelihood of continuing in future streams. Together, these elements offer a more transparent and well-structured overview of live-streaming interactions, highlighting the practical applications of multimodal large language models in analyzing intricate, cross-modal interaction patterns.
When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.
Comments
19-SharingEconomy