The pervasiveness and increasing sophistication of artificial intelligence (AI)-based artifacts within private, organizational, and social realms change how humans interact with machines. Theorizing about the way humans perceive AI-based artifacts is crucial to understanding why and to what extent humans deem these as competent for, i.e., decision-making, yet has traditionally taken a modality-agnostic view. In this paper, we theorize about a particular case of interaction, namely that of voice-based interaction with AI-based artifacts. The capabilities and perceived naturalness of such artifacts, fueled by continuous advances in natural language processing, induce users to deem an artifact as able to act autonomously in a goal-oriented manner. We argue that there is a positive direct relationship between the voice capabilities of an artifact and users’ agency attribution, ultimately obscuring the artifact’s true nature and competencies. This relationship is further moderated by an artifact’s actual agency, uncertainty, and user characteristics.