Abstract

Recent advancements in artificial intelligence (AI) have exacerbated concerns about data privacy, as scholars emphasize the risks posed by biases embedded in training data. Building AI systems requires vast amounts of data for model training, which is often sourced from the public internet, including user-generated media, leading to issues related to ownership and copyright. Fueled by matters related to data ownership and excessive control, new business models enabled through digital platforms called data cooperatives have emerged. Data cooperatives are an emerging type of digital platform that assemble and aggregate the members’ data for their collective benefit. The data cooperative model emphasizes protecting members’ rights and ensures data fairness (Scholz & Calzada 2021). Being a nascent body of research Petreski & Cheong (2024) argue that there is little understanding of how data cooperatives function. We address this gap by exploring how data cooperatives create social impact, especially for marginalized communities, while ensuring the protection and fairness of the members’ data. Data cooperatives are member-owned organizations that enable their members to voluntarily pool data together, own the data, and control it collectively for their mutual benefit or the benefit of the community. Our case study is a company called Karya (which signifies task in Hindi). Since 2021, Karya has involved over 30,000 individuals from rural areas in completing digital tasks such as capturing, labeling, and annotating data for AI training across formats such as speech, text, images, and videos. Our preliminary findings indicate that these emerging forms of digital platforms leverage cooperative models to create equitable economic opportunities for underserved communities while addressing concerns such as data sovereignty and inclusivity in AI development. For instance, Karya grants its members ownership over the data they create and allows them to earn additional revenues each time the data they produce is sold. In addition, for developing its member community, Karya involves individuals from low-income groups and other marginalized communities, such as lower castes and religious minorities. In addition to involving females and individuals from marginalized communities, Karya also involves individuals who are disabled and cannot take up other forms of work. Karya also integrates digital work with educational initiatives to address the dual challenges of income generation and skill development among low-income communities, allowing the member community opportunities for upskilling. Because Karya enables its member community to generate data in several local dialects, the AI systems that are then developed or trained using Karya’s data are more inclusive and perform better in multilingual environments, reducing the systemic bias often present in AI trained primarily on English datasets.

Comments

tpp1320

Share

COinS