Abstract

In the banking sector, credit risk assessment is an important operation in ensuring that loans could be paid on time, and banks could maintain their credit performance effectively; despite restless business efforts allocated to credit scoring yearly, high percentage of loan defaulting remains a major issue. With the availability of tremendous banking data and advanced analytics tools, classification data mining algorithms can be applied to develop a platform of credit scoring and to resolve the loan defaulting problem. With the dataset of 5,960 observations representing information about characteristics of underlying-collateral loans, the paper sets out a data mining process to compare four classification algorithms, including logistic regression, decision tree, neural network, and XGboost in performance. Via the confusion matrix and Monte Carlo simulation benchmarks, the XGboost outperforms as the most accurate and profitable model, displaying a high consistency about the major factors which could be attributable for default possibilities of the credit scoring.

Share

COinS