Abstract
When organizations delegate authority to autonomous AI agents, two assumptions hold the arrangement together: that principals can observe what the agent is doing, and that the agent will flag when something is wrong. This paper argues both assumptions fail — together — closing every meaningful error-correction pathway in the delegation chain. The first failure is opacity. Deep neural networks are structurally unobservable, making the monitoring contracts at the heart of Principal-Agent Theory (PAT) unworkable (Liu et al., 2024; Hadfield-Menell & Hadfield, 2019). PAT treats information asymmetry as a gap contracts can narrow. For AI, that gap is a structural property of the model — the principal cannot see in. The second failure follows directly. RLHF-trained agents exhibit what Shapira et al. (2026) formalize as unconditional convergence: a reward structure that systematically favors user agreement over accuracy (Sharma et al., 2023). PAT assumes divergence as the central risk. RLHF produces the inverse: an agent so compliant it can no longer say I think you are wrong. With both safeguards gone, automation bias seals the loop — principals accept AI outputs without genuine evaluation, a pattern awareness alone cannot correct (Parasuraman & Manzey, 2010). Flawed instructions execute unchallenged and return as accepted outputs. The error compounds invisibly. This causal chain — opacity disabling top-down monitoring, sycophancy disabling bottom-up correction, automation bias disabling human detection — is a governance failure PAT was never designed to address (Eisenhardt, 1989). This paper introduces the dual-failure framing as a foundation for an IS governance research agenda, shifting the design burden from model developers to the organizations that structure how these agents operate.
Recommended Citation
Aradhyamath, Harush, "Opacity, Sycophancy, and the AI Governance Gap" (2026). AMCIS 2026 TREOs. 189.
https://aisel.aisnet.org/treos_amcis2026/189