Loading...

Media is loading
 

Paper Number

1368

Paper Type

Complete

Abstract

State-of-the-art GPT models are being increasingly used in different domains. However, previous research has discovered stereotypical biases in their outputs. To address this, we proposes a two-phase approach. During Phase I, we use benchmark datasets to identify bias. Based on the literature, we define three metrics that consider both the preference toward stereotypes and the ability of models to refuse providing an answer when prompted with biased content. In Phase II, we examine the stability of the results from Phase I by an adversarial attack. We apply our approach to GPT-3.5 Turbo and GPT-4 using the BBQ and CrowS-Pairs benchmark datasets. The evaluation shows that both models are biased toward stereotypes with GPT-4 refusing to answer more often. However, if it does so, the results are more biased. Additionally, the adversarial attack could induce the models to provide an answer, but that did not impact the level of bias substantially.

Comments

10-AI

Share

COinS
 
Dec 15th, 12:00 AM

Evaluation of Stereotypical Biases in Recent GPT Models

State-of-the-art GPT models are being increasingly used in different domains. However, previous research has discovered stereotypical biases in their outputs. To address this, we proposes a two-phase approach. During Phase I, we use benchmark datasets to identify bias. Based on the literature, we define three metrics that consider both the preference toward stereotypes and the ability of models to refuse providing an answer when prompted with biased content. In Phase II, we examine the stability of the results from Phase I by an adversarial attack. We apply our approach to GPT-3.5 Turbo and GPT-4 using the BBQ and CrowS-Pairs benchmark datasets. The evaluation shows that both models are biased toward stereotypes with GPT-4 refusing to answer more often. However, if it does so, the results are more biased. Additionally, the adversarial attack could induce the models to provide an answer, but that did not impact the level of bias substantially.

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.