User engagement (UE) is a relevant affective state in learning contexts. UE detection based on machine vision has recently gained attention; however, there is a shortage of datasets for automated detection of UE. In fact, the ambiguity of the task and domain-specific features of the data make annotation difficult in this domain, especially for large scale data. Thus, we aim to investigate UE detection in a large volume of data based on a small fraction of annotated data. To do so, we apply a safe semi-supervised support vector machine (S4VM) noting that its pure supervised version (SVM) has been successfully applied in UE detection. To compare the results, both SVM and S4VM are applied to our collected data. According to the results, S4VM consistently achieves better performance than SVM. The level of performance is also acceptable according to the literature; however, acquiring high accuracy in UE detection requires more investigation.