Academic

Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models

Sasha Robinson, Kerem Oktar, Katherine M. Collins, Ilia Sucholutsky, Kelsey R. Allen · February 27, 2026 · 1 min read · 5 views

#cs.CL #cs.LG #cs.MA

arXiv:2602.21262v1 Announce Type: new Abstract: With increasing integration of Large Language Models (LLMs) into areas of high-stakes human decision-making, it is important to understand the risks they introduce as advisors. To be useful advisors, LLMs must sift through large amounts of content, written with both benevolent and malicious intent, and then use this information to convince a user to take a specific action. This involves two social capacities: vigilance (the ability to determine which information to use, and which to discard) and persuasion (synthesizing the available evidence to make a convincing argument). While existing work has investigated these capacities in isolation, there has been little prior investigation of how these capacities may be linked. Here, we use a simple multi-turn puzzle-solving game, Sokoban, to study LLMs' abilities to persuade and be rationally vigilant towards other LLM agents. We find that puzzle-solving performance, persuasive capability, and vigilance are dissociable capacities in LLMs. Performing well on the game does not automatically mean a model can detect when it is being misled, even if the possibility of deception is explicitly mentioned. % as part of the prompt. However, LLMs do consistently modulate their token use, using fewer tokens to reason when advice is benevolent and more when it is malicious, even if they are still persuaded to take actions leading them to failure. To our knowledge, our work presents the first investigation of the relationship between persuasion, vigilance, and task performance in LLMs, and suggests that monitoring all three independently will be critical for future work in AI safety.

Executive Summary

The article 'Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models' explores the critical capacities of Large Language Models (LLMs) in high-stakes decision-making scenarios. Using a multi-turn puzzle-solving game, Sokoban, the study investigates the relationship between LLMs' abilities to persuade and maintain vigilance. The findings indicate that these capacities are dissociable, meaning that proficiency in puzzle-solving does not necessarily equate to the ability to detect deception. The study highlights the importance of independently monitoring persuasion, vigilance, and task performance for future AI safety research.

Key Points

▸ LLMs' capacities for persuasion and vigilance are dissociable.
▸ Performance in puzzle-solving does not guarantee the ability to detect deception.
▸ LLMs modulate token use based on the benevolence or malice of advice.
▸ Monitoring persuasion, vigilance, and task performance independently is crucial for AI safety.

Merits

Innovative Approach

The use of a multi-turn puzzle-solving game to study LLMs' capacities is a novel and innovative approach that provides insights into their decision-making processes.

Comprehensive Analysis

The study thoroughly investigates the relationship between persuasion, vigilance, and task performance, offering a holistic view of LLMs' capabilities.

Demerits

Limited Scope

The study focuses solely on the Sokoban game, which may not fully capture the complexities of real-world high-stakes decision-making scenarios.

Generalizability

The findings may not be generalizable to all types of LLMs or different contexts, limiting the broader applicability of the results.

Expert Commentary

The study 'Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models' presents a significant contribution to the field of AI safety by dissecting the intricate relationship between persuasion, vigilance, and task performance in LLMs. The use of the Sokoban game as a experimental framework is particularly noteworthy, as it provides a controlled environment to observe these capacities in action. The findings that LLMs can modulate their token use based on the nature of the advice they receive, yet still be persuaded to take actions leading to failure, highlight a critical area for further research. This study underscores the necessity for developers and policymakers to collaborate in establishing robust safety protocols and ethical guidelines for the deployment of LLMs in high-stakes scenarios. The implications of this research extend beyond the technical realm, touching upon broader societal and ethical considerations that must be addressed to ensure the responsible use of AI technologies.

Recommendations

✓ Further research should explore the generalizability of these findings to other types of LLMs and different decision-making contexts.
✓ Developers should focus on enhancing the vigilance mechanisms in LLMs to improve their ability to detect and respond to deceptive information.

Sources

arXiv - cs.CL

Something extraordinary is coming.

Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models

AI Commentary

Executive Summary

Key Points

Merits

Innovative Approach

Comprehensive Analysis

Demerits

Limited Scope

Generalizability

Expert Commentary

Recommendations

Sources

Related Articles

Budget-Aware Agentic Routing via Boundary-Guided Training

ImpRIF: Stronger Implicit Reasoning Leads to Better Complex Instruction Following

ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision …

Urban Vibrancy Embedding and Application on Traffic Prediction

JCG, PC

HSOLLC Co., Ltd.