While the competition of video market rages on, video platforms eagerly seek new ways to encourage user engagement and enhance user experience through multiple means and channels. The widespread applications of danmu technology have aroused the concerns of academia and practitioners. However, research progress with regard to danmu has been falling behind practical interest. The present study seeks to conceptualize danmu through the lens of discourse architecture and explicate the influence mechanism of danmu’s proximity on users’ perceptions of video quality of experience (QoE) and engagement. We propose that two technical features of danmu architecture, namely spatial proximity and temporal proximity, play essential roles in enhancing user experience (i.e., enjoyment, information assimilation, and perceived visual quality) and engagement. Additionally, we suggest that spatial proximity improve user experience at the expense of video visual quality. We conduct a laboratory experiment with a 3 x 2 design to test the research model.