Paper Number

ICIS2025-1257

Paper Type

Complete

Abstract

Large language models (LLMs) have become foundational in information systems, yet remain vulnerable to reasoning distortions, particularly under biased prompts. This paper introduces a modular correction mechanism for streamed chain-of-thought (CoT) reasoning, enabling real-time detection and revision of intermediate errors without altering model parameters. The framework integrates three diagnostic modules—logical structure validation, causal sufficiency via NLI and causal graphs, and semantic redundancy analysis—to identify and locally correct flawed reasoning steps. Experiments on BBH and SHARC datasets show consistent improvements in logical consistency and robustness across proprietary (GPT-3.5, GPT-4o) and open-source (Mistral, LLaMA) models, especially under biased conditions. Our findings highlight the potential of step-level correction to enhance interpretability and trustworthiness of LLMs in high-stakes IS applications such as compliance and policy analysis.

Comments

12-GenAI

Share

COinS
 
Dec 14th, 12:00 AM

Bias Mitigation in Large Language Models: Streamed Correction of Chain-of-Thought

Large language models (LLMs) have become foundational in information systems, yet remain vulnerable to reasoning distortions, particularly under biased prompts. This paper introduces a modular correction mechanism for streamed chain-of-thought (CoT) reasoning, enabling real-time detection and revision of intermediate errors without altering model parameters. The framework integrates three diagnostic modules—logical structure validation, causal sufficiency via NLI and causal graphs, and semantic redundancy analysis—to identify and locally correct flawed reasoning steps. Experiments on BBH and SHARC datasets show consistent improvements in logical consistency and robustness across proprietary (GPT-3.5, GPT-4o) and open-source (Mistral, LLaMA) models, especially under biased conditions. Our findings highlight the potential of step-level correction to enhance interpretability and trustworthiness of LLMs in high-stakes IS applications such as compliance and policy analysis.

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.