Abstract

We study the credit scoring model in a peer-to-peer (P2P) lending platform which is the core competence of these companies. Machine learning methods, i.e. random forest and XGBoost, are applied to select the most useful features and reduce the dimensionality in the preprocess step and their effects are compared with two other traditional feature selection methods. We also examine the influence of macro variables, i.e. Gross Regional Product, Gross Regional Product Index, GRP per capita, average number of employees, average salary of employees and illiteracy rate, on the default probability prediction. Results are achieved as we supposed that most of them can statistically significantly improve the performance of models in predicting default probability. Moreover, illiteracy rate is correlated with default probability positively which implies the importance of education’s effect on people behavior.

Share

COinS
 

Credit Scoring in Peer-to-peer Lending with Macro Variables and Machine Learning as Feature Selection Methods

We study the credit scoring model in a peer-to-peer (P2P) lending platform which is the core competence of these companies. Machine learning methods, i.e. random forest and XGBoost, are applied to select the most useful features and reduce the dimensionality in the preprocess step and their effects are compared with two other traditional feature selection methods. We also examine the influence of macro variables, i.e. Gross Regional Product, Gross Regional Product Index, GRP per capita, average number of employees, average salary of employees and illiteracy rate, on the default probability prediction. Results are achieved as we supposed that most of them can statistically significantly improve the performance of models in predicting default probability. Moreover, illiteracy rate is correlated with default probability positively which implies the importance of education’s effect on people behavior.