Abstract

Decision tree pruning is critical for the construction of good decision trees. The most popular and widely used method among various pruning methods is cost-complexity pruning, whose implementation requires a training dataset to develop a full tree and a validation dataset to prune the tree. However, different pruned trees are found to be produced when the original dataset are randomly partitioned into different training and validation datasets. Which pruned tree is the best? This paper presents an approach derived from Bayes’ theorem to select the best pruned tree from a group of pruned trees produced by costcomplexity pruning method. The results of an experimental study indicate that the proposed approach works satisfactorily to find the best pruned tree in terms of classification accuracy and performance stability.

Share

COinS