Paper Type

Complete

Paper Number

PACIS2025-1492

Description

Much malicious email involves social engineering, where the textual content attempts to convince the reader to take action that will ultimately be harmful. Email filters have been developed to detect such attacks, but the first step in doing so is extracting the text itself. HTML and CSS in email, however, can be used by attackers to conceal the malicious content, making the text available to the filter differ from the text that appears to the recipient. This paper presents a study of methods to detect such concealment, with a focus on two general sub-types of concealment identified in earlier work. We first conducted a visual determination of concealment in a sample of several thousand emails and compared that classification with text similarity metrics. The results show important differences in performance between metrics. We further explore erroneous results to understand what causes incorrect determinations, informing the design of future solutions.

Comments

Security

Share

COinS
 
Jul 6th, 12:00 AM

Detecting Malicious Email Content Concealment with Text Similarity Metrics

Much malicious email involves social engineering, where the textual content attempts to convince the reader to take action that will ultimately be harmful. Email filters have been developed to detect such attacks, but the first step in doing so is extracting the text itself. HTML and CSS in email, however, can be used by attackers to conceal the malicious content, making the text available to the filter differ from the text that appears to the recipient. This paper presents a study of methods to detect such concealment, with a focus on two general sub-types of concealment identified in earlier work. We first conducted a visual determination of concealment in a sample of several thousand emails and compared that classification with text similarity metrics. The results show important differences in performance between metrics. We further explore erroneous results to understand what causes incorrect determinations, informing the design of future solutions.