Abstract
This study applies advanced machine learning techniques to improve Money Laundering (ML) detection using a synthetic dataset reflecting real-world challenges such as extreme class imbalance (~0.1% laundering transactions), high-dimensional features, and temporal patterns. Models including Logistic Regression, LightGBM, and XGBoost were evaluated, with tuned XGBoost achieving the highest macro F1 score. To mitigate imbalance, scale_pos_weight tuning in gradient boosting models increased the penalty for misclassifying positive cases, enhancing sensitivity to minority patterns without altering the dataset. Statistical validation with bootstrapped confidence intervals and paired t-tests confirmed the significance of results. SHAP (SHapley Additive exPlanations) provided global and local interpretability, highlighting influential features in predictions. The findings underscore the need for metrics suited to imbalanced data, such as precision-recall AUC and macro F1, for reliable ML model evaluation
Recommended Citation
Chapagain, Aadarsha and Turetken, Ozgur, "Machine Learning for Money Laundering Detection" (2025). Proceedings of the 2025 Pre-ICIS SIGDSA Symposium. 5.
https://aisel.aisnet.org/sigdsa2025/5