Abstract

Classification is a frequently encountered data mining problem. While symbolic classifiers have high comprehensibility, their language bias may hamper their classification performance. Incorporating new features constructed based on the original features may relax such language bias and lead to performance improvement. Among others, principal component analysis (PCA) has been proposed as a possible method for enhancing the performance of decision trees. However, since PCA is an unsupervised method, the principal components may not represent the ideal projection directions for optimizing the classification performance. Thus, we expect PCA to have varying effects; it may improve classification performance if the projections enhance class differences, but may degrade performance otherwise. We also posit that the effects of PCA are similar on symbolic classifiers, including decision rules, decision trees, and decision tables. In this paper, we empirically evaluate the effects of PCA on symbolic classifiers and discuss the findings.

Share

COinS