Abstract

Security operations centers face increasing challenges in managing the scale of network log data, much of which is noisy and redundant. This study explores autoencoder-based dimensionality reduction for Zeek connection logs to improve efficiency in intrusion detection. The dataset of 303 features combined binary encodings of session histories with numeric attributes such as duration and byte counts. Five autoencoder (AE) models, including MLP, CNN, and reconstruction-based, aggregated in a stacked design, were developed to generate compact latent representations. Classifiers trained on these compressed features—logistic regression (LR), multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF)—were evaluated on both split datasets and unseen malware traffic. LR, MLP, and SVM maintained accuracy above 99%, while RF achieved around 90% but improved under compression. Dimensionality reduction sharply reduced training time for LR and SVM and compressed storage from 142,509 KB to 11,685 KB. Results show that autoencoders can balance efficiency with high detection accuracy in large-scale security data.

Share

COinS