Abstract

This study investigates the influence of socioeconomic factors on the average academic performance of Computer Science programs in Brazil, based on microdata from the 2021 National Student Performance Exam (ENADE). The objective is to identify the most relevant variables for classifying programs according to their performance levels, using three classification algorithms: decision tree, random forest, and L1-regularized logistic regression. Preprocessing included one-hot encoding, normalization, and course-level aggregation. Model evaluation considered metrics such as accuracy, F1-score, and cross-validation. The random forest model achieved the best predictive performance, while logistic regression demonstrated greater stability. SHAP values were used to interpret the models, highlighting key variables such as family income, institutional choice criteria (reputation vs. cost), weekly study time, vocational motivation, and geographic origin (State of São Paulo). The results underscore the role of socioeconomic conditions in the academic performance of programs and suggest pathways for more equitable educational policies in Brazilian higher education. It is important to note that this study identifies statistical correlations and associations, not direct causal relationships.

Share

COinS