Abstract

Feature selection plays a significant role in the development of categories of information systems related to decision support, such as diagnostic or recommendation systems. Such systems should ensure the possibility of identifying the most important features as well as analysing data from different locations, taking into account the specificity and characteristics of the local data sources. In the process of data analysis, the stage of data preparation, including the transformation of the attribute domain from continuous form to intervals, plays an important role, as the outcome of this process influences the subsequent stages of the analysis. In the paper, an approach to creating a global feature ranking that takes into account the specifics and characteristics of different discretisation algorithms was proposed. A new weight for the estimation of attribute importance was defined and compared with a measure that is implemented in the Python programming language library. Both types of weights were used to create a hierarchical structure of the global ranking of features. The experiments were carried out on datasets from the stylometry domain dedicated to the task of authorship attribution.

Recommended Citation

Zielosko, B., Stańczyk, U., Jabloński, K. & Moshkov, M. (2025). Feature Evaluation Through Decision Trees StructureIn I. Luković, S. Bjeladinović, B. Delibašić, D. Barać, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Empowering the Interdisciplinary Role of ISD in Addressing Contemporary Issues in Digital Transformation: How Data Science and Generative AI Contributes to ISD (ISD2025 Proceedings). Belgrade, Serbia: University of Gdańsk, Department of Business Informatics & University of Belgrade, Faculty of Organizational Sciences. ISBN: 978-83-972632-1-5. https://doi.org/10.62036/ISD.2025.51

Paper Type

Full Paper

DOI

10.62036/ISD.2025.51

Share

COinS
 

Feature Evaluation Through Decision Trees Structure

Feature selection plays a significant role in the development of categories of information systems related to decision support, such as diagnostic or recommendation systems. Such systems should ensure the possibility of identifying the most important features as well as analysing data from different locations, taking into account the specificity and characteristics of the local data sources. In the process of data analysis, the stage of data preparation, including the transformation of the attribute domain from continuous form to intervals, plays an important role, as the outcome of this process influences the subsequent stages of the analysis. In the paper, an approach to creating a global feature ranking that takes into account the specifics and characteristics of different discretisation algorithms was proposed. A new weight for the estimation of attribute importance was defined and compared with a measure that is implemented in the Python programming language library. Both types of weights were used to create a hierarchical structure of the global ranking of features. The experiments were carried out on datasets from the stylometry domain dedicated to the task of authorship attribution.