From Hype to Evidence: Evaluating LLM Reliability in Supply Chain Management

Abstract

Large Language Models (LLMs) promise to transform supply chain management (SCM) through improved forecasting and automated decision support (Aggarwal & Davè, 2018). However, generic benchmarks (Maslej et al., 2025) reveal little about domain-specific performance, leaving open whether supply chain managers can trust LLM-derived proposals or whether the field is building on unverified assumptions. We argue the field needs a dedicated, domain-specific benchmarking approach that accounts for the operational realities of supply chain tasks. Our preliminary work, running repeated forecasting trials with agentic LLM orchestration on a self-hosted infrastructure using CrewAI and Retrieval Augmented Generation (RAG), reveals that LLM-generated forecasts do not outperform traditional algorithmic approaches (Lewis et al., 2020). This confirms that the gap between LLM potential and domain-specific performance exists and demands systematic, rigorous investigation. We invite the IS community to shape a research agenda for rigorous LLM evaluation in SCM, focusing on dimensions such as accuracy, consistency, contextual fit, and cost efficiency. Configuration choices, including temperature settings, prompt design, and model architecture, play a significant role in operational outcomes and deserve attention. From a socio-technical perspective (Bostrom & Heinen, 1977), this includes examining how firms, especially small and medium-sized enterprises (SMEs), can build evaluation capabilities needed for responsible AI adoption in line with data sovereignty requirements.

Recommended Citation

Cenk, Gökhan; Engel, Tobias; Kreßel, Jonathan; and Andersson, Jonas, "From Hype to Evidence: Evaluating LLM Reliability in Supply Chain Management" (2026). AMCIS 2026 TREOs. 106.
https://aisel.aisnet.org/treos_amcis2026/106

AMCIS 2026 TREOs

From Hype to Evidence: Evaluating LLM Reliability in Supply Chain Management

Abstract

Recommended Citation

Search

Links

Browse

Author Corner

Links

AMCIS 2026 TREOs

From Hype to Evidence: Evaluating LLM Reliability in Supply Chain Management

Authors

Abstract

Recommended Citation

Share

Search

Links

Browse

Author Corner

Links