Location
Hilton Hawaiian Village, Honolulu, Hawaii
Event Website
https://hicss.hawaii.edu/
Start Date
3-1-2024 12:00 AM
End Date
6-1-2024 12:00 AM
Description
Behavioral malware detection has been shown to be an effective method for detecting malware running on computing hosts. Machine learning (ML) models are often used for this task, which use representative behavioral data from a device to make a classification as to whether an observation is malware or not. Although these models can perform well, machine learning models in security are often trained on imbalanced training datasets that yield poor real-world efficacy, as they favor the overrepresented class. Thus, we need a way to augment the underrepresented class. Some common data augmentation techniques include SMOTE, data resampling/upsampling, or using generative algorithms. In this work, we explore using generative algorithms for this task, and show how those results compare to results obtained using SMOTE and upsampling. Specifically, we feed the less-represented class of data into a Generative Adversarial Network (GAN) to create enough realistic synthetic data to balance the dataset. In this work, we show how using a GAN to balance a dataset that favors benign data helps a shallow Neural Network achieve a higher Area Under the Receiver Operating Characteristic Curve (AUC) and a lower False Positive Rate (FPR).
Recommended Citation
Carter, John; Mancoridis, Spiros; Protopapas, Pavlos; and Galinkin, Erick, "IoT Malware Data Augmentation using a Generative Adversarial Network" (2024). Hawaii International Conference on System Sciences 2024 (HICSS-57). 2.
https://aisel.aisnet.org/hicss-57/st/threat_hunting/2
IoT Malware Data Augmentation using a Generative Adversarial Network
Hilton Hawaiian Village, Honolulu, Hawaii
Behavioral malware detection has been shown to be an effective method for detecting malware running on computing hosts. Machine learning (ML) models are often used for this task, which use representative behavioral data from a device to make a classification as to whether an observation is malware or not. Although these models can perform well, machine learning models in security are often trained on imbalanced training datasets that yield poor real-world efficacy, as they favor the overrepresented class. Thus, we need a way to augment the underrepresented class. Some common data augmentation techniques include SMOTE, data resampling/upsampling, or using generative algorithms. In this work, we explore using generative algorithms for this task, and show how those results compare to results obtained using SMOTE and upsampling. Specifically, we feed the less-represented class of data into a Generative Adversarial Network (GAN) to create enough realistic synthetic data to balance the dataset. In this work, we show how using a GAN to balance a dataset that favors benign data helps a shallow Neural Network achieve a higher Area Under the Receiver Operating Characteristic Curve (AUC) and a lower False Positive Rate (FPR).
https://aisel.aisnet.org/hicss-57/st/threat_hunting/2