Abstract

Credit card defaulters are on the rise year by year, which would lead commercial banks into a serious business crisis. It is important for commercial banks to control the default rate of credit cards. According to the low percentage of defaulters, it is challenging to predict them using a traditional machine learning algorithm. To address this problem, an improved ensemble learning model is proposed, where the Synthetic Minority Oversampling Technique (SMOTE) is used to oversample the data set, and the Extreme Gradient Boosting algorithm (XGBoost) is introduced to construct the predicting model. For clarity, this model is called a SMOTE-XGBoost model. Customer default data from the UCI machine learning dataset is used to empirically test the effectiveness. In terms of Recall, ACC, and AUC values, ten-fold cross-validation is carried out to evaluate and compare the performance between the SMOTE-XGBoost model and other models, including the general XGBoost model and Random Forest. The empirical results show that the SMOTE-XGBoost model performs well and outperforms other models.

Share

COinS