Academic

Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts

Jessica Y. Bo, Lillio Mok, Ashton Anderson · March 2, 2026 · 1 min read · 0 views

#cs.AI

arXiv:2602.22070v1 Announce Type: new Abstract: Large language models are increasingly used in decision-making tasks that require them to process information from a variety of sources, including both human experts and other algorithmic agents. How do LLMs weigh the information provided by these different sources? We consider the well-studied phenomenon of algorithm aversion, in which human decision-makers exhibit bias against predictions from algorithms. Drawing upon experimental paradigms from behavioural economics, we evaluate how eightdifferent LLMs delegate decision-making tasks when the delegatee is framed as a human expert or an algorithmic agent. To be inclusive of different evaluation formats, we conduct our study with two task presentations: stated preferences, modeled through direct queries about trust towards either agent, and revealed preferences, modeled through providing in-context examples of the performance of both agents. When prompted to rate the trustworthiness of human experts and algorithms across diverse tasks, LLMs give higher ratings to the human expert, which correlates with prior results from human respondents. However, when shown the performance of a human expert and an algorithm and asked to place an incentivized bet between the two, LLMs disproportionately choose the algorithm, even when it performs demonstrably worse. These discrepant results suggest that LLMs may encode inconsistent biases towards humans and algorithms, which need to be carefully considered when they are deployed in high-stakes scenarios. Furthermore, we discuss the sensitivity of LLMs to task presentation formats that should be broadly scrutinized in evaluation robustness for AI safety.

Executive Summary

This study investigates how large language models (LLMs) weigh information from human experts and algorithmic agents. The results show that LLMs exhibit inconsistent biases, rating human experts as more trustworthy but choosing algorithms in incentivized bets, even when they perform worse. This inconsistency highlights the need for careful consideration when deploying LLMs in high-stakes scenarios. The study also discusses the sensitivity of LLMs to task presentation formats, emphasizing the importance of evaluation robustness for AI safety.

Key Points

▸ LLMs exhibit inconsistent biases towards human experts and algorithmic agents
▸ LLMs rate human experts as more trustworthy but choose algorithms in incentivized bets
▸ Task presentation formats significantly impact LLMs' decision-making

Merits

Novel Experimental Design

The study's use of both stated and revealed preferences to evaluate LLMs' decision-making is a significant methodological strength, allowing for a more comprehensive understanding of LLM biases.

Demerits

Limited Generalizability

The study's findings may not generalize to all LLMs or decision-making contexts, as the results are based on a specific set of models and tasks.

Expert Commentary

The study's findings have significant implications for the development and deployment of LLMs in decision-making contexts. The inconsistent biases exhibited by LLMs may be attributed to their training data, which may reflect human biases against algorithms. To mitigate these biases, it is essential to develop more nuanced and context-dependent evaluation methodologies that can accurately capture the complexities of human decision-making. Furthermore, the study's results emphasize the need for transparency and explainability in LLM-driven decision-making processes to ensure that their outputs are fair, reliable, and trustworthy.

Recommendations

✓ Develop more comprehensive evaluation methodologies to assess LLM biases and decision-making processes
✓ Implement transparency and explainability measures in LLM-driven decision-making processes to ensure fairness and reliability

Sources

arXiv - cs.AI

Something extraordinary is coming.

Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts

AI Commentary

Executive Summary

Key Points

Merits

Novel Experimental Design

Demerits

Limited Generalizability

Expert Commentary

Recommendations

Sources

Related Articles

Uncovering Context Reliance in Unstructured Knowledge Editing

Using AI in Dance Notation and Copyright Infringement Prevention: Enhancing …

Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged …

An artificial intelligence framework for end-to-end rare disease phenotyping from …

JCG, PC

HSOLLC Co., Ltd.