Abstract

This study leverages machine learning to predict self-rated health (SRH) among U.S. adults using the 2023 BRFSS dataset (n = 117,386). SRH was framed as a binary outcome (Good vs. Poor) and modeled using Logistic Regression, Random Forest, XGBoost, Decision Tree, and a stacked ensemble. To address class imbalance, ENN, SMOTE, and repeated under-sampling were compared, with repeated under-sampling yielding the most consistent performance. Logistic Regression achieved an AUC of 0.81, with the stacked model slightly improving to 0.82. SHAP analysis revealed mental health, exercise, income, and diabetes as top predictors. Findings, interpreted through the Social Determinants of Health and Health Belief Model, emphasize socioeconomic and behavioral influences on perceived health.

Share

COinS