Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models
arXiv:2602.21262v1 Announce Type: new Abstract: With increasing integration of Large Language Models (LLMs) into areas of high-stakes human decision-making, it is important to understand the risks they introduce as advisors. To be useful advisors, LLMs must sift through large amounts of content, written with both benevolent and malicious intent, and then use this information to convince a user to take a specific action. This involves two social capacities: vigilance (the ability to determine which information to use, and which to discard) and persuasion (synthesizing the available evidence to make a convincing argument). While existing work has investigated these capacities in isolation, there has been little prior investigation of how these capacities may be linked. Here, we use a simple multi-turn puzzle-solving game, Sokoban, to study LLMs' abilities to persuade and be rationally vigilant towards other LLM agents. We find that puzzle-solving performance, persuasive capability, and
arXiv:2602.21262v1 Announce Type: new Abstract: With increasing integration of Large Language Models (LLMs) into areas of high-stakes human decision-making, it is important to understand the risks they introduce as advisors. To be useful advisors, LLMs must sift through large amounts of content, written with both benevolent and malicious intent, and then use this information to convince a user to take a specific action. This involves two social capacities: vigilance (the ability to determine which information to use, and which to discard) and persuasion (synthesizing the available evidence to make a convincing argument). While existing work has investigated these capacities in isolation, there has been little prior investigation of how these capacities may be linked. Here, we use a simple multi-turn puzzle-solving game, Sokoban, to study LLMs' abilities to persuade and be rationally vigilant towards other LLM agents. We find that puzzle-solving performance, persuasive capability, and vigilance are dissociable capacities in LLMs. Performing well on the game does not automatically mean a model can detect when it is being misled, even if the possibility of deception is explicitly mentioned. % as part of the prompt. However, LLMs do consistently modulate their token use, using fewer tokens to reason when advice is benevolent and more when it is malicious, even if they are still persuaded to take actions leading them to failure. To our knowledge, our work presents the first investigation of the relationship between persuasion, vigilance, and task performance in LLMs, and suggests that monitoring all three independently will be critical for future work in AI safety.
Executive Summary
The article 'Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models' explores the critical capacities of Large Language Models (LLMs) in high-stakes decision-making scenarios. Using a multi-turn puzzle-solving game, Sokoban, the study investigates the relationship between LLMs' abilities to persuade and maintain vigilance. The findings indicate that these capacities are dissociable, meaning that proficiency in puzzle-solving does not necessarily equate to the ability to detect deception. The study highlights the importance of independently monitoring persuasion, vigilance, and task performance for future AI safety research.
Key Points
- ▸ LLMs' capacities for persuasion and vigilance are dissociable.
- ▸ Performance in puzzle-solving does not guarantee the ability to detect deception.
- ▸ LLMs modulate token use based on the benevolence or malice of advice.
- ▸ Monitoring persuasion, vigilance, and task performance independently is crucial for AI safety.
Merits
Innovative Approach
The use of a multi-turn puzzle-solving game to study LLMs' capacities is a novel and innovative approach that provides insights into their decision-making processes.
Comprehensive Analysis
The study thoroughly investigates the relationship between persuasion, vigilance, and task performance, offering a holistic view of LLMs' capabilities.
Demerits
Limited Scope
The study focuses solely on the Sokoban game, which may not fully capture the complexities of real-world high-stakes decision-making scenarios.
Generalizability
The findings may not be generalizable to all types of LLMs or different contexts, limiting the broader applicability of the results.
Expert Commentary
The study 'Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models' presents a significant contribution to the field of AI safety by dissecting the intricate relationship between persuasion, vigilance, and task performance in LLMs. The use of the Sokoban game as a experimental framework is particularly noteworthy, as it provides a controlled environment to observe these capacities in action. The findings that LLMs can modulate their token use based on the nature of the advice they receive, yet still be persuaded to take actions leading to failure, highlight a critical area for further research. This study underscores the necessity for developers and policymakers to collaborate in establishing robust safety protocols and ethical guidelines for the deployment of LLMs in high-stakes scenarios. The implications of this research extend beyond the technical realm, touching upon broader societal and ethical considerations that must be addressed to ensure the responsible use of AI technologies.
Recommendations
- ✓ Further research should explore the generalizability of these findings to other types of LLMs and different decision-making contexts.
- ✓ Developers should focus on enhancing the vigilance mechanisms in LLMs to improve their ability to detect and respond to deceptive information.