Paper Number

ICIS2025-2506

Paper Type

Short

Abstract

Large language models (LLMs) and pretrained language models (PLMs) are increasingly deployed in multilingual contexts, yet their reliance on standardized corpora risks marginalizing communities whose language practices diverge from formal norms. This paper examines how linguistic variation in Swahili—loanwords, code-mixing, tribal lexicons, and youth vernacular (Sheng)—affects model performance and fairness. Using a publicly available dataset of free-text psychometric responses from 2,170 Swahili speakers across more than 20 tribes, we evaluate PLMs (mBERT, AfriBERTa) and LLMs (Qwen, Llama) on prediction tasks. Results show that domain-adapted PLMs outperform general-purpose LLMs on both continuous and binary assessments, yet systematic disparities persist: models exhibit disparate impact across tribal groups and consistent misclassification when sociolinguistic features are present. We extend IS research on sociotechnical systems by foregrounding linguistic variation as a cultural artifact in AI evaluation and propose a framework for assessing fairness in contexts of intra-language diversity.

Comments

05-ResponsibleIS

Share

COinS
 
Dec 14th, 12:00 AM

Cultural Artifacts, Tribal Heterogeneity, and Language Models

Large language models (LLMs) and pretrained language models (PLMs) are increasingly deployed in multilingual contexts, yet their reliance on standardized corpora risks marginalizing communities whose language practices diverge from formal norms. This paper examines how linguistic variation in Swahili—loanwords, code-mixing, tribal lexicons, and youth vernacular (Sheng)—affects model performance and fairness. Using a publicly available dataset of free-text psychometric responses from 2,170 Swahili speakers across more than 20 tribes, we evaluate PLMs (mBERT, AfriBERTa) and LLMs (Qwen, Llama) on prediction tasks. Results show that domain-adapted PLMs outperform general-purpose LLMs on both continuous and binary assessments, yet systematic disparities persist: models exhibit disparate impact across tribal groups and consistent misclassification when sociolinguistic features are present. We extend IS research on sociotechnical systems by foregrounding linguistic variation as a cultural artifact in AI evaluation and propose a framework for assessing fairness in contexts of intra-language diversity.

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.