Abstract

The modeling of customer features has become a core component in modern financial analytics. There are several difficulties in adopting conventional machine learning (ML) methodologies to finance domain: distributional asymmetry in the observations, class imbalance in the training labels, and data sparsity resulting from infrequent occurrence. In this study, we try to address the statistical challenges of financial data. Then, we test feature processing using multiple machine learning approaches in combination with established methods. We evaluate separate feature selection results as part of a prediction pipeline, and show how they differ across models. The empirical implications of the feature transformation and selection on the prediction outcomes are discussed.

Share

COinS
 

Feature Generation Using Machine Learning from Large Sparse Financial Data

The modeling of customer features has become a core component in modern financial analytics. There are several difficulties in adopting conventional machine learning (ML) methodologies to finance domain: distributional asymmetry in the observations, class imbalance in the training labels, and data sparsity resulting from infrequent occurrence. In this study, we try to address the statistical challenges of financial data. Then, we test feature processing using multiple machine learning approaches in combination with established methods. We evaluate separate feature selection results as part of a prediction pipeline, and show how they differ across models. The empirical implications of the feature transformation and selection on the prediction outcomes are discussed.