Abstract

With the advance of GenAI, the production and dissemination of disinformation become faster and easier than ever. Among all kinds of disinformation, Deepfakes is particularly a serious threat because of its vivid reality and high credibility borrowing from public figures, celebrities, and real scenes. Though the warning label is a common mechanism deployed by social media platforms to flag disinformation, extant research mostly comes from political or public health domains. The discussion in the fields of business and management is relatively infertile. Furthermore, current practice adopts single-modal (i.e., plain text) warning labels to combat disinformation. When it comes to multimodal Deepfakes, whether such a mismatch still works is questionable. Last, the recent cancellation, exercised by X and Meta, seems to reject the effect of warning labels through the practitioner’s lens. With these divides in mind, the present work investigates whether the multimodal component matters to mitigate Deepfakes and which combination of modality and argument type yields the best performance. This study is rooted in the attentional capture and control theory, suggesting that the salient distractor grabs observers’ attention and interrupts ongoing tasks, the media richness theory, suggesting that rich media such as videos convey visual and audio non-verbal cues can enhance the believability of a message, and the source credibility theory, suggesting that the perceived attractiveness, trustworthiness, and expertise of the source influence the credibility of the message. Following these thoughts, this study hypothesizes that warning label with multimodal parity (i.e., video-based) has the most significant effect on offsetting the impact of Deepfakes and holding modality constant, compared to social-based and content-based arguments, the forensic argument has a more substantial effect because it provides trustworthy and expertise hard evidence. However, when modality encounters argument type, despite argument type staying the same in the cases of multimodal mismatch, for multimodal parity (video-based) warning labels, mentioning forensic evidence might trigger the audience’s critical thinking to elaborate the authenticity of the warning label, therefore lowering its effectiveness. In this treatment, disclosing the social-based evidence (e.g., agent’s post history, connection, or social network) might be more persuasive since the warning labels reserve the attentional and believable elements and downgrade the credibility of the target Deepfakes simultaneously. The present work would conduct a 3 x 3 between-subject experimental design to examine how modality (text/image/video) and argument type (forensic/source-based/content-based) in the warning label design affect participants’ perception and engagement with the Deepfake content and the target brand. A minimum of 300 participants will be recruited and randomly assigned into one of the 9 experimental conditions or the control group. The main effects of modality and argument type and the interaction effect would be tested by adopting MANOVA analysis. By doing so, this research expects to fill the gaps in academics and practitioners and offer quantitative evidence in response to the practice of cancellation exercised by social media platforms.

Comments

tpp1441

Share

COinS