Abstract

The application of machine learning (ML) in predicting mental healthcare faces a challenge due to imbalanced datasets. ML techniques analyse extensive datasets to make predictions; however, the unequal distribution of samples, with the majority belonging to diagnosed mental disorders, can lead to biased model training and limited generalisation. To mitigate the issue of class imbalance in mental health datasets, this study employed diverse ML techniques, namely, resampling, ensemble, and algorithm-specific approaches and metrics such as accuracy, precision, recall and F1 score. The dataset used was collected from the Open Sourcing Mental Illness website, spanning 2016 to 2021. The findings indicate that ensemble techniques, particularly Random Forest, excelled in managing class imbalance compared to other methods. Beyond conventional performance metrics, the study introduced Kappa, balanced accuracy, and geometric mean to evaluate model effectiveness. These findings provide valuable insights for improving mental health predictions, enabling early diagnosis and personalised treatment strategies.

Share

COinS