Gen AI: Deployment and Impact

Same Same but Different: Evaluating Hate Speech Detoxification through an LLM-based Agentic Framework

Thuy Linh (Isabella) Phan, The University of QueenslandFollow
James Boyce, The University of QueenslandFollow
Hetiao (Slim) Xie, The University of QueenslandFollow
Morteza Namvar, The University of QueenslandFollow
Marten Risius, University of Applied Sciences Neu-UlmFollow

Paper Number

ICIS2025-2633

Paper Type

Short

Abstract

Evaluating the effectiveness of hate speech detoxification is an emerging challenge, particularly as large language models (LLMs) become central to content moderation. While text detoxification (TD) presents a promising alternative to deletion or banning, current evaluation methods remain limited. Human evaluation is costly and inconsistent, and existing automatic metrics often fail to capture social sensitivity. We introduce SAFE-TD, a Structured Agentic Framework for Evaluation of TD, which simulates three agent roles to assess detoxified outputs from multiple perspectives. Our preliminary analysis reveals four outcome types and identifies a critical risk: the generation of implicit hate speech that appears neutral but retains harmful meaning. These findings expose under-explored trade-offs in TD and limitations in existing evaluation practices. SAFE-TD contributes a scalable, socially grounded approach to evaluating LLM-based TD, offering a foundation for more ethical and nuanced AI development for online safety.

Comments

12-GenAI

Recommended Citation

Phan, Thuy Linh (Isabella); Boyce, James; Xie, Hetiao (Slim); Namvar, Morteza; and Risius, Marten, "Same Same but Different: Evaluating Hate Speech Detoxification through an LLM-based Agentic Framework" (2025). ICIS 2025 Proceedings. 34.
https://aisel.aisnet.org/icis2025/gen_ai/gen_ai/34

Download

COinS

Dec 14th, 12:00 AM

Same Same but Different: Evaluating Hate Speech Detoxification through an LLM-based Agentic Framework

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.

Gen AI: Deployment and Impact

Same Same but Different: Evaluating Hate Speech Detoxification through an LLM-based Agentic Framework

Paper Number

Paper Type

Abstract

Comments

Recommended Citation

Search

Browse

Author Corner

Links

Gen AI: Deployment and Impact

Same Same but Different: Evaluating Hate Speech Detoxification through an LLM-based Agentic Framework

Presenter Information

Paper Number

Paper Type

Abstract

Comments

Recommended Citation

Share

Search

Browse

Author Corner

Links