Abstract

Large Language Models (LLMs), such as ChatGPT and Gemini, are increasingly used not just for routine automation tasks like code generation or grammar checks, but also for reasoning-intensive work, such as interpreting business and legal case studies. For instance, these models are now assisting students in answering open-ended, case-based questions that require contextual understanding and applied reasoning—skills that go beyond textbook knowledge. While LLMs excel in producing fluent, coherent text, they are trained to generate the most statistically probable sequence of words, making them less suited for structured reasoning, which involves drawing conclusions from premises using directional logic, as discussed by psychologists like L. Rips in his book, The Psychology of Proof (1994). This distinction is critical in MBA-style case education, where students must interpret ambiguity, apply theoretical frameworks, weigh trade-offs, and justify their conclusions—tasks LLMs struggle with unless properly guided by well-designed prompts. Our paper thus addresses the following question: What role does prompt design play in enabling LLMs to simulate reasoning in response to complex, context-dependent questions? While prior research has explored how prompts influence factual accuracy or fluency of LLM responses, little attention has been paid to how prompts can act as cognitive scaffolds—external structures that guide the reasoning path of LLMs. We ground our inquiry in Distributed Cognition Theory (DCT) proposed by psychologist E.Hutchins in 1995, which holds that cognition is not confined to individual minds but distributed across tools, artifacts, people, and environments. Based on this theory, we propose that prompts function as external cognitive artifacts that shape how LLMs reason. Instead of traditionally viewing prompts as mere input instructions to an LLM, we frame them as mechanisms for redistributing cognitive labor from the user to the LLM. To test this, we design prompts for 10 business school case questions and submit them to ChatGPT-4o, for which we alter each prompt’s vocabulary along 2 dimensions proposed by Valmeekam et al. (2024), whose paper showed how phrasing prompts differently affects reasoning in LLMs and LRMs (Large Reasoning Models)—advanced LLMs with experimental reasoning capabilities—by using them to engage the models in a cognitive game. We use 2 prompt types for our test: Explicit scaffold prompts with direct, instructive language (e.g., “Apply SWOT analysis..”) and Implicit scaffold prompts with persona-based cues (e.g., “How would a strategic thinker respond..”). MBA instructors then evaluate the responses for 3 indicators of reasoning—theoretical framework use, trade-off analysis, and justification (Valmeekam et al. 2024) This research makes a key conceptual contribution by reframing prompt engineering as an interface design for distributed reasoning. Prompts are positioned not just as instructions that enable LLMs to generate specific outputs, but instead as cognitive bridges (structures guiding the direction of logic) that enable generative AI to behave more like a reasoning collaborator than a passive text generator. This shift opens new avenues for IS scholars exploring human–GenAI collaboration, especially in contexts requiring analytical thought.

Comments

tpp1362

Share

COinS