Start Date
12-13-2015
Description
Traditional research in student retention is survey-based, relying on data collected from questionnaires, which is not optimal for proactive prediction and real-time decision (student intervention) support. Machine learning approaches have their own limitations. Therefore, in this research, we propose a big data approach to formulating a predictive model. We used commonly available (student demographic and academic) data in academic institutions augmented by derived implicit social networks from students’ university smart card transactions. Furthermore, we applied a sequence learning method to infer students’ campus integration from their purchasing behaviors. Since student retention data is highly imbalanced, we built a new ensemble classifier to predict students at-risk of dropping out. For model evaluation, we use a real-world dataset of smart card transactions from a large educational institution. The experimental results show that the addition of campus integration and social behavior features refined using the ensemble method significantly improve prediction accuracy and recall.
Recommended Citation
Ram, Sudha; Wang, Yun; Currim, Faiz; and Currim, Sabah, "Using Big Data for Predicting Freshmen Retention" (2015). ICIS 2015 Proceedings. 13.
https://aisel.aisnet.org/icis2015/proceedings/DecisionAnalytics/13
Using Big Data for Predicting Freshmen Retention
Traditional research in student retention is survey-based, relying on data collected from questionnaires, which is not optimal for proactive prediction and real-time decision (student intervention) support. Machine learning approaches have their own limitations. Therefore, in this research, we propose a big data approach to formulating a predictive model. We used commonly available (student demographic and academic) data in academic institutions augmented by derived implicit social networks from students’ university smart card transactions. Furthermore, we applied a sequence learning method to infer students’ campus integration from their purchasing behaviors. Since student retention data is highly imbalanced, we built a new ensemble classifier to predict students at-risk of dropping out. For model evaluation, we use a real-world dataset of smart card transactions from a large educational institution. The experimental results show that the addition of campus integration and social behavior features refined using the ensemble method significantly improve prediction accuracy and recall.