Paper Type

Short

Paper Number

PACIS2025-1927

Description

The spread of hate speech on social media and platforms’ reluctance to adopt the most prominent solutions necessitates more advanced methods for mitigating hate speech. Text detoxification (TD) represents a frontier approach by transforming text style to eliminate hateful content while preserving its original meaning. This approach shows the promising potential to remove online hateful content while avoiding risks to free speech. While emerging studies have begun exploring algorithmic advancements using large language models (LLMs) for TD, the overall understanding remains limited in guiding the development of effective TD systems. Therefore, our study rigorously reviewed 21 selected studies on TD to summarize existing knowledge and identify the challenges. We then propose a three-aspect framework with nine factors that effective TD systems should incorporate. Future research will extend this effort to develop a computational artifact for TD, with potential to significantly enhance this frontier approach for hate speech mitigation.

Comments

Social

Share

COinS
 
Jul 6th, 12:00 AM

The New Frontier in Mitigating Hate Speech: A Review to Guide Text Detoxification

The spread of hate speech on social media and platforms’ reluctance to adopt the most prominent solutions necessitates more advanced methods for mitigating hate speech. Text detoxification (TD) represents a frontier approach by transforming text style to eliminate hateful content while preserving its original meaning. This approach shows the promising potential to remove online hateful content while avoiding risks to free speech. While emerging studies have begun exploring algorithmic advancements using large language models (LLMs) for TD, the overall understanding remains limited in guiding the development of effective TD systems. Therefore, our study rigorously reviewed 21 selected studies on TD to summarize existing knowledge and identify the challenges. We then propose a three-aspect framework with nine factors that effective TD systems should incorporate. Future research will extend this effort to develop a computational artifact for TD, with potential to significantly enhance this frontier approach for hate speech mitigation.