Abstract

Data binarization involves converting a continuous data attribute into a finite set of binary attributes while minimizing information loss. It plays a crucial role in feature engineering in the data mining analysis. Data binarization simplifies data, improves model training quality, enhances model performance and interpretability of results, helping in understanding complex patterns. In this paper we present an original data binarization framework, called angle-based data binarization, that converts continuous attributes into discrete binary attributes. The proposed framework allows not only to simplify machine learning models, but can also lead to the improvement of the accuracy of a number of well-known traditional machine learning methods. We present results of an extensive series of experiments which evaluate the efficiency of the proposed method in the area of data classification. Using popular classification algorithms, we compared classification quality achieved on source datasets with classification quality achieved on their binarized versions. We also discuss binary attribute pruning, based on elimination of attributes with poor discriminative power.

Recommended Citation

Zakrzewicz, M. & Morzy, T. (2025). Angle-Based Data Binarization FrameworkIn I. Luković, S. Bjeladinović, B. Delibašić, D. Barać, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Empowering the Interdisciplinary Role of ISD in Addressing Contemporary Issues in Digital Transformation: How Data Science and Generative AI Contributes to ISD (ISD2025 Proceedings). Belgrade, Serbia: University of Gdańsk, Department of Business Informatics & University of Belgrade, Faculty of Organizational Sciences. ISBN: 978-83-972632-1-5. https://doi.org/10.62036/ISD.2025.32

Paper Type

Full Paper

DOI

10.62036/ISD.2025.32

Share

COinS
 

Angle-Based Data Binarization Framework

Data binarization involves converting a continuous data attribute into a finite set of binary attributes while minimizing information loss. It plays a crucial role in feature engineering in the data mining analysis. Data binarization simplifies data, improves model training quality, enhances model performance and interpretability of results, helping in understanding complex patterns. In this paper we present an original data binarization framework, called angle-based data binarization, that converts continuous attributes into discrete binary attributes. The proposed framework allows not only to simplify machine learning models, but can also lead to the improvement of the accuracy of a number of well-known traditional machine learning methods. We present results of an extensive series of experiments which evaluate the efficiency of the proposed method in the area of data classification. Using popular classification algorithms, we compared classification quality achieved on source datasets with classification quality achieved on their binarized versions. We also discuss binary attribute pruning, based on elimination of attributes with poor discriminative power.