Auditing Equity in Large Language Models: Insights from Dialogue and Image Classification Tasks
In this talk from LSC’s 2025 Science Communication Colloquium, Associate Professor Kaiping Chen presents findings from two studies examining how user identity influences large language model responses.
The first study recruited approximately 3,000 participants to have real-time dialogues with GPT-3 about climate change and Black Lives Matter. The research found that users holding minority opinions (e.g., climate skeptics) and those with lower education levels reported significantly worse user experiences compared to their counterparts — yet these same groups showed greater positive attitude change after the dialogue. The study also revealed that GPT-3 used more negative sentiment and was more likely to cite external evidence when conversing with opinion-minority users.
The second study examined GPT-4’s performance on image classification tasks (gender detection and emotion classification) using varied user personas. Notably, when users identified as transgender or non-binary in their prompts, GPT-4 refused to perform the task 30–40% of the time, compared to only 5% when users identified as male or female. The study also found that GPT-4 associated happiness more frequently with female-classified images and neutrality with male-classified images.
Chen proposes a framework for evaluating equity in dialogue systems based on three components: diversity in who audits these systems, comparability in user experience and learning across subpopulations, and comparability in deliberation style toward different groups.
Bio: Dr. Kaiping Chen is an Associate Professor of Computational Communication in the Department of Life Sciences Communication at UW-Madison. She received her PhD in Communication from Stanford University. Her research uses data science and machine learning methods to examine how digital media and technologies affect public discourse on science and social issues, with a focus on empowering vulnerable populations to engage in deliberation on complex policy topics.
Questions
If you have any lingering questions about this resource, please feel free to post to the Nexus Q&A on GitHub. We will improve materials on this website as additional questions come in.