Paper Number
ICIS2025-2506
Paper Type
Short
Abstract
Large language models (LLMs) and pretrained language models (PLMs) are increasingly deployed in multilingual contexts, yet their reliance on standardized corpora risks marginalizing communities whose language practices diverge from formal norms. This paper examines how linguistic variation in Swahili—loanwords, code-mixing, tribal lexicons, and youth vernacular (Sheng)—affects model performance and fairness. Using a publicly available dataset of free-text psychometric responses from 2,170 Swahili speakers across more than 20 tribes, we evaluate PLMs (mBERT, AfriBERTa) and LLMs (Qwen, Llama) on prediction tasks. Results show that domain-adapted PLMs outperform general-purpose LLMs on both continuous and binary assessments, yet systematic disparities persist: models exhibit disparate impact across tribal groups and consistent misclassification when sociolinguistic features are present. We extend IS research on sociotechnical systems by foregrounding linguistic variation as a cultural artifact in AI evaluation and propose a framework for assessing fairness in contexts of intra-language diversity.
Recommended Citation
Oketch, Kezia; Lalor, John P.; and Abbasi, Ahmed, "Cultural Artifacts, Tribal Heterogeneity, and Language Models" (2025). ICIS 2025 Proceedings. 12.
https://aisel.aisnet.org/icis2025/ethical_is/ethical_is/12
Cultural Artifacts, Tribal Heterogeneity, and Language Models
Large language models (LLMs) and pretrained language models (PLMs) are increasingly deployed in multilingual contexts, yet their reliance on standardized corpora risks marginalizing communities whose language practices diverge from formal norms. This paper examines how linguistic variation in Swahili—loanwords, code-mixing, tribal lexicons, and youth vernacular (Sheng)—affects model performance and fairness. Using a publicly available dataset of free-text psychometric responses from 2,170 Swahili speakers across more than 20 tribes, we evaluate PLMs (mBERT, AfriBERTa) and LLMs (Qwen, Llama) on prediction tasks. Results show that domain-adapted PLMs outperform general-purpose LLMs on both continuous and binary assessments, yet systematic disparities persist: models exhibit disparate impact across tribal groups and consistent misclassification when sociolinguistic features are present. We extend IS research on sociotechnical systems by foregrounding linguistic variation as a cultural artifact in AI evaluation and propose a framework for assessing fairness in contexts of intra-language diversity.
When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.
Comments
05-ResponsibleIS