Abstract

In this study, we investigate the impact of explainable models on different populations. We considered the problem of predicting type 2 diabetes (T2D) which is one of the most prominent health problems worldwide. We utilized datasets sampled from two distinct populations and employed a Random Forest (RF) model to predict diabetes onset. Feature importance was calculated to determine whether different populations exhibit varying contributions to the model's decision boundary. The RF model achieved an overall accuracy of 80.09% on the MIMIC-IV dataset and 94.04% on the RHG Chinese dataset. Our findings reveal that even with the same model, feature contributions can differ across populations. This showcases the need for further investigation into the performance of machine learning models on diverse datasets. In addition, our findings signify the critical role of explainable AI to elucidate the complexities of machine learning models, particularly in understanding the decision-making processes of these models in healthcare. Assessing the impact of features can significantly enhance disease prediction and prevention and clarify some ambiguity of machine learning models.

Share

COinS