Abstract

The global prevalence of diabetes mellitus poses a significant public health challenge. This study aims to use dimensionality reduction methods with machine learning (ML) algorithms to predict the diabetes stage and assess the performance of the developed predictive model. Unlike many studies on predicting diabetes, this study makes use of both medical indicators and social determinants of health to predict the risk of diabetes. Utilizing a large dataset obtained from the Centers for Disease Control and Prevention, comprising 253,680 instances and 23 features, this study employs various ML algorithms and dimensionality reduction techniques. In addition, the study applied several metrics, namely accuracy, precision, recall, F1 score, Receiver Operating Characteristic, Area Under the Curve, and balanced accuracy. The study finds that Logistic Regression and XGBoost models outperform other classifiers, achieving an accuracy of 85%. The study suggests that future work could benefit from incorporating deep learning techniques.

Share

COinS