Abstract

Large language model (LLM) inference operations now account for 70-90% of production AI compute resources, yet carbon optimization strategies for these workloads have received limited attention. This paper presents a carbon-aware orchestration framework that dynamically shifts LLM inference workloads across temporal and spatial dimensions based on real-time electricity grid carbon intensity data. Our multi-objective optimization approach balances carbon emissions reduction, service level agreement (SLA) compliance, and operational cost management through a Pareto-optimal decision support model. Using hybrid electricity grid data combining historical records from open-source carbon intensity APIs with synthetic augmentation spanning 12 global regions over 30 days, our simulation experiments demonstrate carbon reductions of 35-52% for mixed-latency workloads while maintaining 99.5% SLA compliance. The framework contributes to the Green Information Systems literature by providing actionable sustainability mechanisms for inference-dominated production systems.

Share

COinS