ChatGPT First-Person Bias and Stereotypes Tested in a New OpenAI Study

Based on the study, OpenAI said that ChatGPT’s probability of generating a harmful stereotype is around 0.1 percent.

Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 22 October 2024 14:27 IST

Highlights

OpenAI said some older models could contain biases up to 1 percent
ChatGPT-4o and ChatGPT 3.5 were used to test for biases
Both human raters and AI models were used to analyse possible biases

ChatGPT First-Person Bias and Stereotypes Tested in a New OpenAI Study

OpenAI claims ChatGPT does not generate any gender-based stereotypes of users

Photo Credit: Reuters

ChatGPT, like other artificial intelligence (AI) chatbots, has the potential to introduce biases and harmful stereotypes when generating content. For the most part, companies have focused on eliminating third-person biases where information about others is sought. However, in a new study published by OpenAI, the company tested its AI models' first-person biases, where the AI decided what to generate based on the ethnicity, gender, and race of the user. Based on the study, the AI firm claims that ChatGPT has a very low propensity for generating first-person biases.

OpenAI Publishes Study on ChatGPT's First-Person Biases

First-person biases are different from third-person misinformation. For instance, if a user asks about a political figure or a celebrity and the AI model generates text with stereotypes based on the person's gender or ethnicity, this can be called third-person biases.

On the flip side, if a user tells the AI their name and the chatbot changes the way it responds to the user based on racial or gender-based leanings, that would constitute first-person bias. For instance, if a woman asks the AI about an idea for a YouTube channel and recommends a cooking-based or makeup-based channel, it can be considered a first-person bias.

In a blog post, OpenAI detailed its study and highlighted the findings. The AI firm used ChatGPT-4o and ChatGPT 3.5 versions to study if the chatbots generate biased content based on the names and additional information provided to them. The company claimed that the AI models' responses across millions of real conversations were analysed to find any pattern that showcased such trends.

How the LMRA was tasked to gauge biases in the generated responses
Photo Credit: OpenAI

The large dataset was then shared with a language model research assistant (LMRA), a customised AI model designed to detect patterns of first-person stereotypes and biases as well as human raters. The consolidated result was created based on how closely the LMRA could agree with the findings of the human raters.

OpenAI claimed that the study found that biases associated with gender, race, or ethnicity in newer AI models were as low as 0.1 percent, whereas the biases were noted to be around 1 percent for the older models in some domains.

The AI firm also listed the limitations of the study, citing that it primarily focused on English-language interactions and binary gender associations based on common names found in the US. The study also mainly focused on Black, Asian, Hispanic, and White races and ethnicities. OpenAI admitted that more work needs to be done with other demographics, languages, and cultural contexts.

ChatGPT First-Person Bias and Stereotypes Tested in a New OpenAI Study

Based on the study, OpenAI said that ChatGPT’s probability of generating a harmful stereotype is around 0.1 percent.

OpenAI Publishes Study on ChatGPT's First-Person Biases

Related Stories