Abstract

Generative AI offers introductory programming students immediate support for debugging, syntax, code explanation, and algorithmic reasoning, but the same tools can also bypass the learning processes that instructors hope to cultivate. This TREO talk presents a mixed-methods learning analytics study of a custom Socratic AI tutor in an introductory Python course. The tutor was designed to withhold completed code and instead provide hints, questions, and reflective prompts grounded in scaffolding, help-seeking, cognitive load, self-efficacy, and self-regulated learning theory (Aleven et al., 2003; Bandura, 1997; Sweller, 1988; Vygotsky & Cole, 1978; Wood et al., 1976). The study links psychometric survey measures of prior programming experience, technical English proficiency, and programming self-efficacy with behavioral features extracted from 95 valid AI-student chat interactions, including user turns, turns to resolution, prompt length, lexical diversity, sentiment, struggle type, and prompting strategy. Findings show that Socratic AI can support productive struggle while also introducing equity-sensitive design challenges. First, results reveal a statistically significant Socratic Gap: prior programming experience moderated the relationship between interaction volume and performance (p = .045). Longer conversations were slightly negative for absolute beginners but positive for students with prior experience, suggesting that indirect prompts can become cognitive overload when students lack foundational programming schemas. Second, technical English proficiency strongly predicted interactional efficiency. Each one-point increase in proficiency was associated with approximately 3.18 fewer turns to resolution (r = -.467, p < .001), indicating a linguistic time tax for students who supplied shorter, lower-context prompts. Third, frustration loops were rare: 76 of 95 interactions reflected productive struggle and only two reflected frustration loops, although self-efficacy interacted with sentiment in predicting grades (p = .036). Finally, moderate-difficulty assignments showed a significant decline in user turns across the semester (r = -.341, p = .025), with students requiring about 1.40 fewer turns per comparable assignment block. This pattern suggests that the tutor functioned as a scaffold rather than a crutch, supporting an evolution of learner independence. The study contributes to research on AI-assisted programming education by showing that the pedagogical value of LLM tutors depends not simply on access or usage volume, but on the fit between tutor design and learner characteristics. For educators and designers, the results point to three practical needs: adaptive Socratic strictness for absolute beginners, explicit instruction in high-context prompting, and early monitoring of low-sentiment unresolved sessions. More broadly, the talk argues that transcript-level behavioral evidence is essential for evaluating whether AI tutors promote independence, reinforce dependency, or impose hidden burdens on particular learners.

Share

COinS