Paper Type

Complete

Abstract

Generative AI is reshaping software development across the software development life cycle (SDLC), including planning, implementation, review, testing, delivery, and maintenance. By reducing the marginal cost of producing code, generative AI increases the relative importance of verification, integration, coordination, and risk management. This shift creates a measurement challenge: traditional activity proxies (e.g., lines of code, commits) can rise even when end-to-end delivery capacity is constrained by review throughput, quality assurance, and architectural fit. Building on multidimensional productivity theory (SPACE and DevEx) and empirical evidence that AI’s effects vary by task structure, developer experience, and workflow context, this study examines how organizations currently measure developer productivity in AI-assisted environments and how practitioners believe measurement should evolve. We report results from an anonymous mixed methods survey conducted in February 2026, combining descriptive statistics with thematic coding of open-ended responses. Respondents expressed only moderate confidence that current metrics reflect performance under AI assistance and reported widespread reliance on easy-to-instrument volume measures. In contrast, they strongly preferred AI-era indicators that capture effectiveness of AI usage, time saved, task complexity, and the verification and integration work increasingly central to developer contribution. A large majority favored portfolio-based evaluation to reduce gaming and preserve tradeoffs between speed, quality, and impact. We conclude that AI-era productivity should be assessed as system delivery under verification and coordination constraints, not as individual output volume.

Paper Number

1881

Comments

AI SYSTEM

Share

COinS
 
Aug 15th, 12:00 AM

AI-Era Software Developer Productivity and Performance Metrics

Generative AI is reshaping software development across the software development life cycle (SDLC), including planning, implementation, review, testing, delivery, and maintenance. By reducing the marginal cost of producing code, generative AI increases the relative importance of verification, integration, coordination, and risk management. This shift creates a measurement challenge: traditional activity proxies (e.g., lines of code, commits) can rise even when end-to-end delivery capacity is constrained by review throughput, quality assurance, and architectural fit. Building on multidimensional productivity theory (SPACE and DevEx) and empirical evidence that AI’s effects vary by task structure, developer experience, and workflow context, this study examines how organizations currently measure developer productivity in AI-assisted environments and how practitioners believe measurement should evolve. We report results from an anonymous mixed methods survey conducted in February 2026, combining descriptive statistics with thematic coding of open-ended responses. Respondents expressed only moderate confidence that current metrics reflect performance under AI assistance and reported widespread reliance on easy-to-instrument volume measures. In contrast, they strongly preferred AI-era indicators that capture effectiveness of AI usage, time saved, task complexity, and the verification and integration work increasingly central to developer contribution. A large majority favored portfolio-based evaluation to reduce gaming and preserve tradeoffs between speed, quality, and impact. We conclude that AI-era productivity should be assessed as system delivery under verification and coordination constraints, not as individual output volume.