Abstract
Phishing remains one of the most persistent and evolving cyber threats, targeting individuals and organizations. The most commonly used phishing technique is social engineering tactics that exploit human vulnerabilities and bypass traditional anti-phishing filters (Bera and Kim 2025). Bypassing the automated filters when phishing emails reach users' inboxes, training and awareness help to increase phishing detection. This oversight presents a critical call for research to deepen knowledge on improving the effectiveness of the existing AI-ML-based anti-phishing filters. However, existing phishing research has focused either on manual, or purely technical approaches, like—blacklisting, whitelisting and ML based classification. It warrants more research integrating psychological features analysis and their dynamic inclusion in the feature blacklist to be utilized by dynamic classification techniques. Only a few recent studies have utilized psychological and design-based features, using reinforcement learning approaches, to detect new phishing features (Smadi et al. 2018). The latest list contains 50 features. Extending the existing body of knowledge, our study thus focuses on the following research questions: RQ1: How to improve the phishing detection system to adapt itself to identify new phishing features? RQ2: Are there more features that can enrich the existing list of features to improve the automated anti-phishing filters' effectiveness? In answering the RQs, we propose a dynamic phishing detection system using a multimodal approach that can adapt itself to detect any new phishing cues. We ran an initial pilot study and identified 14 more features, which increased the existing list to a total of sixty-four features of emails and URLs. In identifying the features, we focused on the header (subject, salutation) and body (text, URL, attachment) of the emails (spam, genuine) collected from five publicly available benchmark datasets: LingSpam (481, 2401) and Enron (46,502, 43000), Nazario’s emails (3710 phishing), SpamAssassin (500, 4951), and spear phishing emails (1370 spam) (El Aassal et al. 2020; Kaggle 2019; Metsis et al. 2006; Nazario 2016; Sakkis et al. 2003). 26722 blacklisted URLs are collected from PhishTank (OpenDNS 2016). Using the newly prepared feature list as input, this study is now in the process of developing an ensemble multi-model approach by integrating the following four combinations of algorithms to be used in the corresponding part of the emails and URL to classify phishing emails— Self-Training BERT + SimCSE algorithm for Text analysis and classification, Graph Neural Networks (GNN) + Label Propagation for URL & Hyperlink analysis, Semi-Supervised Variational Autoencoder (SSVAE) + Noisy Student Training for Attachment. HTML & File analysis, and Noisy Student Training for Image analysis.
Recommended Citation
Bera, Debalina; Paidipati, Kiran Kumar; and Kim, Dan J., "Enhancing Dynamic Phishing Detection with Enriched Feature List– An Ensemble Modeling-based Detection Approach" (2025). AMCIS 2025 TREOs. 61.
https://aisel.aisnet.org/treos_amcis2025/61
Comments
tpp1247