Abstract

The paper proposes a decision tree-based model for dispersed data classification. The dispersed data are stored in tabular form and are collected independently. They may have different objects as well as attributes, but some of them may be common among the tables. The proposed model has a two-level hierarchical architecture that uses decision trees at each level. At the lower level, bagging is used with decision trees for each table. For a classified object, prediction vectors are generated for each table, showing the probabilities that the object belongs to various decision classes. A global tree is trained based on vectors generated for validation set and it makes the final classification for a test object. This paper outlines experimental findings for our proposed approach and contrasts them with established methodologies from the literature. Statistical analysis, based on 16 dispersed data sets, confirms that our model improves classification quality for dispersed data.

Recommended Citation

Przybyła-Kasperek, M., Addo, B.A. & Kusztal, K. (2024). Dual-Level Decision Tree-Based Model for Dispersed Data Classification. In B. Marcinkowski, A. Przybylek, A. Jarzębowicz, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Harnessing Opportunities: Reshaping ISD in the post-COVID-19 and Generative AI Era (ISD2024 Proceedings). Gdańsk, Poland: University of Gdańsk. ISBN: 978-83-972632-0-8. https://doi.org/10.62036/ISD.2024.44

Paper Type

Full Paper

DOI

10.62036/ISD.2024.44

Share

COinS
 

Dual-Level Decision Tree-Based Model for Dispersed Data Classification

The paper proposes a decision tree-based model for dispersed data classification. The dispersed data are stored in tabular form and are collected independently. They may have different objects as well as attributes, but some of them may be common among the tables. The proposed model has a two-level hierarchical architecture that uses decision trees at each level. At the lower level, bagging is used with decision trees for each table. For a classified object, prediction vectors are generated for each table, showing the probabilities that the object belongs to various decision classes. A global tree is trained based on vectors generated for validation set and it makes the final classification for a test object. This paper outlines experimental findings for our proposed approach and contrasts them with established methodologies from the literature. Statistical analysis, based on 16 dispersed data sets, confirms that our model improves classification quality for dispersed data.