Academic

Attribution Bias in Large Language Models

arXiv:2604.05224v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly used to support search and information retrieval, it is critical that they accurately attribute content to its original authors. In this work, we introduce AttriBench, the first fame- and demographically-balanced quote attribution benchmark dataset. Through explicitly balancing author fame and demographics, AttriBench enables controlled investigation of demographic bias in quote attribution. Using this dataset, we evaluate 11 widely used LLMs across different prompt settings and find that quote attribution remains a challenging task even for frontier models. We observe large and systematic disparities in attribution accuracy between race, gender, and intersectional groups. We further introduce and investigate suppression, a distinct failure mode in which models omit attribution entirely, even when the model has access to authorship information. We find that suppression is widespread and un

E
Eliza Berman, Bella Chang, Daniel B. Neill, Emily Black
· · 1 min read · 5 views

arXiv:2604.05224v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly used to support search and information retrieval, it is critical that they accurately attribute content to its original authors. In this work, we introduce AttriBench, the first fame- and demographically-balanced quote attribution benchmark dataset. Through explicitly balancing author fame and demographics, AttriBench enables controlled investigation of demographic bias in quote attribution. Using this dataset, we evaluate 11 widely used LLMs across different prompt settings and find that quote attribution remains a challenging task even for frontier models. We observe large and systematic disparities in attribution accuracy between race, gender, and intersectional groups. We further introduce and investigate suppression, a distinct failure mode in which models omit attribution entirely, even when the model has access to authorship information. We find that suppression is widespread and unevenly distributed across demographic groups, revealing systematic biases not captured by standard accuracy metrics. Our results position quote attribution as a benchmark for representational fairness in LLMs.

Executive Summary

This article introduces AttriBench, a quote attribution benchmark dataset, to investigate demographic bias in Large Language Models (LLMs). The study evaluates 11 LLMs and finds significant disparities in attribution accuracy across different demographic groups. The researchers also identify a new failure mode called suppression, where models omit attribution entirely, even with access to authorship information. The findings highlight the need for representational fairness in LLMs and position quote attribution as a benchmark for fairness.

Key Points

  • Introduction of AttriBench, a balanced quote attribution benchmark dataset
  • Evaluation of 11 LLMs reveals significant demographic biases in quote attribution
  • Identification of suppression, a distinct failure mode in LLMs

Merits

Comprehensive Dataset

AttriBench provides a balanced and comprehensive dataset for investigating demographic bias in LLMs

Demerits

Limited Model Scope

The study only evaluates 11 LLMs, which may not be representative of all LLMs

Expert Commentary

The study's findings underscore the importance of fairness and transparency in LLMs. The introduction of AttriBench and the identification of suppression as a distinct failure mode are significant contributions to the field. However, the study's scope is limited, and further research is needed to fully understand the extent of demographic biases in LLMs. As LLMs become increasingly ubiquitous, it is essential to prioritize representational fairness to ensure that these models do not perpetuate existing social inequalities.

Recommendations

  • Developers should use AttriBench to evaluate and improve the fairness of their LLMs
  • Regulators should establish guidelines for quote attribution and fairness in LLMs

Sources

Original: arXiv - cs.AI