Management Information Systems Quarterly

Automatically Detecting Voice Phishing: A Large Audio Model Approach

Abstract

Phishing attacks remain one of the most prevalent and pervasive cybersecurity concerns. Voice phishing (i.e., vishing) is an emerging type of phishing attack where malicious actors use audio channels to steal sensitive information from victims. However, vishing detection is a challenging task due to its real-time nature and the limited availability of datasets. To help address the concern of vishing detection, this study proposes the vishing generative pretrained transformer (VishGPT). VishGPT adopts the computational design paradigm and incorporates novel reinforcement learning-based large language model fine-tuning and synthetic data model pretraining to automatically detect vishing attempts in real time. We evaluated VishGPT using a series of benchmark experiments, where we empirically demonstrated its improvement over state-of-the-art vishing detection and audio classification models. The results suggest that our proposed VishGPT achieved state-of-the-art performance in terms of accuracy (86.18%), precision (90.63%), recall (85.02%), and F1-score (87.74%). VishGPT offers practical value to cybersecurity professionals, end users, and academia. Additionally, VishGPT provides important design principles in the form of a custom proximal policy optimization (PPO) reward function and synthetic pretraining to the information systems knowledge base.

Recommended Citation

Ampel, Benjamin; Samtani, Sagar; and Chen, Hsinchun. 2026. "Automatically Detecting Voice Phishing: A Large Audio Model Approach," MIS Quarterly, (50: 2) pp.527-556.

Download

COinS

Management Information Systems Quarterly

Automatically Detecting Voice Phishing: A Large Audio Model Approach

Authors

Abstract

Recommended Citation

Share

Search