PACIS 2019 Proceedings


Learning from imbalanced data is still a challenging problem in spite of more than two decades of continuous development in this field. To deal with this problem, several data-level and algorithmic-level methods are proposed. Hybrid methods, which combine the advantages of the two previous groups, are also gaining increasing popularity. Therefore, in this paper, we put our focus on new hybrid approaches combining different sampling strategies with adapted decision trees to tackle the binary imbalanced problems. Our experiments consider five preprocessing methods and three asymmetric split criteria, which results in fifteen evaluated combinations. Unlike the majority of the studies, we take into account the intrinsic data characteristics in the analysis of each finding in order to gain a deeper understanding in the field of imbalanced data. The achieved findings, supported by statistical tests, end up to learn the extent to which sampling can be advantageous when combined with algorithmic solutions.