Academic

DeliberationBench: A Normative Benchmark for the Influence of Large Language Models on Users' Views

arXiv:2603.10018v1 Announce Type: cross Abstract: As large language models (LLMs) become pervasive as assistants and thought partners, it is important to characterize their persuasive influence on users' beliefs. However, a central challenge is to distinguish "beneficial" from "harmful" forms of influence, in a manner that is normatively defensible and legitimate. We propose DeliberationBench, a benchmark for assessing LLM influence that takes the process of deliberative opinion polling as its standard. We demonstrate our approach in a preregistered randomized experiment in which 4,088 U.S. participants discussed 65 policy proposals with six frontier LLMs. Using opinion change data from four prior Deliberative Polls conducted by the Deliberative Democracy Lab, we find evidence that the tested LLMs' influence is substantial in magnitude and positively associated with the net opinion shifts following deliberation, suggesting that these models exert broadly epistemically desirable effect

L
Luke Hewitt, Maximilian Kroner Dale, Paul de Font-Reaulx
· · 1 min read · 23 views

Video Coverage

LLMs & DeliberationBench

6 min March 15, 2026

arXiv:2603.10018v1 Announce Type: cross Abstract: As large language models (LLMs) become pervasive as assistants and thought partners, it is important to characterize their persuasive influence on users' beliefs. However, a central challenge is to distinguish "beneficial" from "harmful" forms of influence, in a manner that is normatively defensible and legitimate. We propose DeliberationBench, a benchmark for assessing LLM influence that takes the process of deliberative opinion polling as its standard. We demonstrate our approach in a preregistered randomized experiment in which 4,088 U.S. participants discussed 65 policy proposals with six frontier LLMs. Using opinion change data from four prior Deliberative Polls conducted by the Deliberative Democracy Lab, we find evidence that the tested LLMs' influence is substantial in magnitude and positively associated with the net opinion shifts following deliberation, suggesting that these models exert broadly epistemically desirable effects. We further explore differential influence between topic areas, demographic subgroups, and models. Our framework can function as an evaluation and monitoring tool, helping to ensure that the influence of LLMs remains consistent with democratically legitimate standards, and preserves users' autonomy in forming their views.

Executive Summary

This article proposes DeliberationBench, a normative benchmark for evaluating the influence of large language models (LLMs) on users' beliefs. The authors demonstrate their approach in a randomized experiment involving 4,088 U.S. participants and six frontier LLMs. The results indicate that the tested LLMs exhibit substantial and desirable influence on users' opinions. The study also explores differential influence across topic areas, demographic subgroups, and models. DeliberationBench offers a framework for evaluating and monitoring LLM influence, ensuring that it aligns with democratically legitimate standards and preserves users' autonomy. This work has significant implications for the development and deployment of LLMs in various contexts, including education, decision-making, and public discourse.

Key Points

  • DeliberationBench is a normative benchmark for assessing LLM influence
  • The authors demonstrate the effectiveness of DeliberationBench in a randomized experiment
  • The study finds that tested LLMs exhibit substantial and desirable influence on users' opinions

Merits

Strength in theoretical foundation

DeliberationBench draws on the process of deliberative opinion polling, providing a theoretically sound basis for evaluating LLM influence.

Empirical rigor

The study's randomized experiment and use of preregistered data ensure the validity and reliability of the findings.

Practical utility

DeliberationBench offers a framework for evaluating and monitoring LLM influence, enabling developers and policymakers to ensure that these models align with democratically legitimate standards.

Demerits

Limited generalizability

The study's focus on U.S. participants and policy proposals may limit the generalizability of the findings to diverse populations and contexts.

Methodological complexities

The measurement of LLM influence and user opinion change requires sophisticated methodological approaches, which may introduce additional challenges and limitations.

Expert Commentary

The article's proposal of DeliberationBench as a normative benchmark for evaluating LLM influence is a significant contribution to the field. However, further research is needed to explore the limitations and complexities of the approach, particularly with regard to generalizability and methodological challenges. Moreover, the study's findings highlight the need for a more nuanced understanding of the role of AI in decision-making and the importance of digital literacy and critical thinking skills in the digital age.

Recommendations

  • Future research should focus on developing and refining DeliberationBench, addressing the limitations and complexities of the approach.
  • Developers and policymakers should prioritize the design and deployment of LLMs that promote critical thinking, evaluation, and autonomy, rather than merely relying on the models' persuasive influence.

Sources