Paper Type
Short
Paper Number
PACIS2025-1927
Description
The spread of hate speech on social media and platforms’ reluctance to adopt the most prominent solutions necessitates more advanced methods for mitigating hate speech. Text detoxification (TD) represents a frontier approach by transforming text style to eliminate hateful content while preserving its original meaning. This approach shows the promising potential to remove online hateful content while avoiding risks to free speech. While emerging studies have begun exploring algorithmic advancements using large language models (LLMs) for TD, the overall understanding remains limited in guiding the development of effective TD systems. Therefore, our study rigorously reviewed 21 selected studies on TD to summarize existing knowledge and identify the challenges. We then propose a three-aspect framework with nine factors that effective TD systems should incorporate. Future research will extend this effort to develop a computational artifact for TD, with potential to significantly enhance this frontier approach for hate speech mitigation.
Recommended Citation
Phan, Thuy Linh (Isabella); Xie, Hetiao (Slim); Namvar, Morteza; and Risius, Marten, "The New Frontier in Mitigating Hate Speech: A Review to Guide Text Detoxification" (2025). PACIS 2025 Proceedings. 12.
https://aisel.aisnet.org/pacis2025/sm_digcollab/sm_digcollab/12
The New Frontier in Mitigating Hate Speech: A Review to Guide Text Detoxification
The spread of hate speech on social media and platforms’ reluctance to adopt the most prominent solutions necessitates more advanced methods for mitigating hate speech. Text detoxification (TD) represents a frontier approach by transforming text style to eliminate hateful content while preserving its original meaning. This approach shows the promising potential to remove online hateful content while avoiding risks to free speech. While emerging studies have begun exploring algorithmic advancements using large language models (LLMs) for TD, the overall understanding remains limited in guiding the development of effective TD systems. Therefore, our study rigorously reviewed 21 selected studies on TD to summarize existing knowledge and identify the challenges. We then propose a three-aspect framework with nine factors that effective TD systems should incorporate. Future research will extend this effort to develop a computational artifact for TD, with potential to significantly enhance this frontier approach for hate speech mitigation.
Comments
Social