Abstract

In the fierce competitive banking industry, accurate prediction of customers’ subscription to time deposits is vital for banks. This can reduce unnecessary time and energy spent on targeted customer service to improve bank efficiency. Traditional prediction methods do not handle the imbalanced data problem very well. In this paper, in order to minimize the impacts from imbalanced data, we combine Synthetic Minority Oversampling Technique (SMOTE)and Edited Nearest Neighbor Technique (ENN) to make the data as balanced as possible. Then, Extreme Gradient Boosting algorithm (XGBoost) is adopted as classification algorithm to improve the accuracy of data classification results. For clarity, this model is called SMOTEENN-XGBoost model. A bank customers dataset published on the Kaggle platform is used to demonstrate its effect by numerical experiments. We compare the performance of the SMOTEENN-XGBoost in this paper with Decision Tree (DT), Adaptive Boosting (AdaBoost), XGBoost, SMOTE-XGBoost in terms of Accuracy (ACC), Area Under ROC Curve (AUC), and Geometric-mean (G-mean). The results show that the mean ACC, AUC and G-mean of SMOTEENN-XGBoost model are 0.92, 0.97, and 0.92, which are better than the other models. It indicates that this model has good classification performance and can effectively dig out potential customers.

Share

COinS