WISP 2025 Proceedings

Dimensionality Reduction of SOC Data to Efficiently Build Classifiers

Kyle Wright, University at Albany, SUNYFollow
Lakshika Vaishnav, University at Albany, SUNYFollow
Sanjay Goel, University at Albany, SUNYFollow

Abstract

Security operations centers face increasing challenges in managing the scale of network log data, much of which is noisy and redundant. This study explores autoencoder-based dimensionality reduction for Zeek connection logs to improve efficiency in intrusion detection. The dataset of 303 features combined binary encodings of session histories with numeric attributes such as duration and byte counts. Five autoencoder (AE) models, including MLP, CNN, and reconstruction-based, aggregated in a stacked design, were developed to generate compact latent representations. Classifiers trained on these compressed features—logistic regression (LR), multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF)—were evaluated on both split datasets and unseen malware traffic. LR, MLP, and SVM maintained accuracy above 99%, while RF achieved around 90% but improved under compression. Dimensionality reduction sharply reduced training time for LR and SVM and compressed storage from 142,509 KB to 11,685 KB. Results show that autoencoders can balance efficiency with high detection accuracy in large-scale security data.

Recommended Citation

Wright, Kyle; Vaishnav, Lakshika; and Goel, Sanjay, "Dimensionality Reduction of SOC Data to Efficiently Build Classifiers" (2025). WISP 2025 Proceedings. 12.
https://aisel.aisnet.org/wisp2025/12

Download

COinS

WISP 2025 Proceedings

Dimensionality Reduction of SOC Data to Efficiently Build Classifiers

Abstract

Recommended Citation

Search

Links

Browse

Author Corner

WISP 2025 Proceedings

Dimensionality Reduction of SOC Data to Efficiently Build Classifiers

Authors

Abstract

Recommended Citation

Share

Search

Links

Browse

Author Corner