Abstract
The paper proposes a decision tree-based model for dispersed data classification. The dispersed data are stored in tabular form and are collected independently. They may have different objects as well as attributes, but some of them may be common among the tables. The proposed model has a two-level hierarchical architecture that uses decision trees at each level. At the lower level, bagging is used with decision trees for each table. For a classified object, prediction vectors are generated for each table, showing the probabilities that the object belongs to various decision classes. A global tree is trained based on vectors generated for validation set and it makes the final classification for a test object. This paper outlines experimental findings for our proposed approach and contrasts them with established methodologies from the literature. Statistical analysis, based on 16 dispersed data sets, confirms that our model improves classification quality for dispersed data.
Paper Type
Full Paper
DOI
10.62036/ISD.2024.44
Dual-Level Decision Tree-Based Model for Dispersed Data Classification
The paper proposes a decision tree-based model for dispersed data classification. The dispersed data are stored in tabular form and are collected independently. They may have different objects as well as attributes, but some of them may be common among the tables. The proposed model has a two-level hierarchical architecture that uses decision trees at each level. At the lower level, bagging is used with decision trees for each table. For a classified object, prediction vectors are generated for each table, showing the probabilities that the object belongs to various decision classes. A global tree is trained based on vectors generated for validation set and it makes the final classification for a test object. This paper outlines experimental findings for our proposed approach and contrasts them with established methodologies from the literature. Statistical analysis, based on 16 dispersed data sets, confirms that our model improves classification quality for dispersed data.
Recommended Citation
Przybyła-Kasperek, M., Addo, B.A. & Kusztal, K. (2024). Dual-Level Decision Tree-Based Model for Dispersed Data Classification. In B. Marcinkowski, A. Przybylek, A. Jarzębowicz, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Harnessing Opportunities: Reshaping ISD in the post-COVID-19 and Generative AI Era (ISD2024 Proceedings). Gdańsk, Poland: University of Gdańsk. ISBN: 978-83-972632-0-8. https://doi.org/10.62036/ISD.2024.44