ICNLSP 2024: Investigating Gender Bias in Large Language Models Through Text Generation

Описание к видео ICNLSP 2024: Investigating Gender Bias in Large Language Models Through Text Generation

Investigating Gender Bias in Large Language Models Through Text Generation

By: Shweta Soundararajan and Sarah Jane Delany
Technological University Dublin

7th International Conference on Natural Language and Speech Processing.
https://icnlsp.org/2024welcome

Abstract:
Large Language Models (LLMs) have swiftly become essential tools across diverse applications such as automated content creation, personal writing aids, language translation, and academic writing. However LLMs also raise significant ethical and societal concerns, particularly regarding potential gender biases in the text they produce. This study investigates the presence of gender bias in four Large Language Models (LLMs): ChatGPT 3.5, ChatGPT 4, Llama 2 7B, and Llama 2 13B. By generating a gendered language dataset using these LLMs, focusing on sentences about men and women that include gender-coded words from gender lexicon, we analyze the extent of gender bias in the outputs of these LLMs. Our evaluation is two-fold: we use the generated dataset to train a gender stereotype detection task and measure gender bias in the classifier, and we perform a comprehensive analysis of the LLM generated text at both the sentence and word levels. Gender bias evaluations in classification tasks and lexical content reveal that all the LLMs demonstrate significant gender bias. ChatGPT 4 and Llama 2 13B exhibit the least gender bias, with weak associations between gendered adjectives used and the gender of the person described in the sentence. In contrast, ChatGPT 3.5 and Llama 2 7B exhibit the most gender bias, showing strong associations between the gendered adjectives used and the gender of the person described in the sentence. This study underscores the need for continuous efforts to mitigate gender bias in LLMs to prevent the perpetuation of harmful stereotypes in LLM-generated text. It also highlights the importance of cautious scrutiny when utilizing LLM-generated text in contexts sensitive to gender issues and emphasizes the need for caution when employing LLMs for generating texts about people.

Комментарии

Информация по комментариям в разработке