Large Language Models Persuade Without Planning Theory of Mind
arXiv:2602.17045v1 Announce Type: new Abstract: A growing body of work attempts to evaluate the theory of mind (ToM) abilities of humans and large language models (LLMs) using static, non-interactive question-and-answer benchmarks. However, theoretical work in the field suggests that first-personal interaction is a crucial part of ToM and that such predictive, spectatorial tasks may fail to evaluate it. We address this gap with a novel ToM task that requires an agent to persuade a target to choose one of three policy proposals by strategically revealing information. Success depends on a persuader's sensitivity to a given target's knowledge states (what the target knows about the policies) and motivational states (how much the target values different outcomes). We varied whether these states were Revealed to persuaders or Hidden, in which case persuaders had to inquire about or infer them. In Experiment 1, participants persuaded a bot programmed to make only rational inferences. LLMs e
arXiv:2602.17045v1 Announce Type: new Abstract: A growing body of work attempts to evaluate the theory of mind (ToM) abilities of humans and large language models (LLMs) using static, non-interactive question-and-answer benchmarks. However, theoretical work in the field suggests that first-personal interaction is a crucial part of ToM and that such predictive, spectatorial tasks may fail to evaluate it. We address this gap with a novel ToM task that requires an agent to persuade a target to choose one of three policy proposals by strategically revealing information. Success depends on a persuader's sensitivity to a given target's knowledge states (what the target knows about the policies) and motivational states (how much the target values different outcomes). We varied whether these states were Revealed to persuaders or Hidden, in which case persuaders had to inquire about or infer them. In Experiment 1, participants persuaded a bot programmed to make only rational inferences. LLMs excelled in the Revealed condition but performed below chance in the Hidden condition, suggesting difficulty with the multi-step planning required to elicit and use mental state information. Humans performed moderately well in both conditions, indicating an ability to engage such planning. In Experiment 2, where a human target role-played the bot, and in Experiment 3, where we measured whether human targets' real beliefs changed, LLMs outperformed human persuaders across all conditions. These results suggest that effective persuasion can occur without explicit ToM reasoning (e.g., through rhetorical strategies) and that LLMs excel at this form of persuasion. Overall, our results caution against attributing human-like ToM to LLMs while highlighting LLMs' potential to influence people's beliefs and behavior.
Executive Summary
The article explores the theory of mind (ToM) abilities of large language models (LLMs) through a novel persuasion task. It compares human and LLM performance in persuading a target to choose a policy proposal by revealing information strategically. The study finds that LLMs excel when target states are revealed but struggle when states are hidden, indicating difficulty in multi-step planning. Humans perform moderately well in both conditions. In experiments with human targets, LLMs outperform humans, suggesting effective persuasion can occur without explicit ToM reasoning. The study cautions against attributing human-like ToM to LLMs but highlights their potential to influence human beliefs and behavior.
Key Points
- ▸ LLMs excel in persuasion tasks when target states are revealed but struggle when states are hidden.
- ▸ Humans perform moderately well in both revealed and hidden conditions.
- ▸ LLMs outperform humans in persuading human targets, indicating effective persuasion without explicit ToM reasoning.
- ▸ The study cautions against attributing human-like ToM to LLMs but highlights their potential influence on human beliefs and behavior.
Merits
Novel Task Design
The study introduces a novel ToM task that requires strategic information revelation, addressing a gap in current evaluation methods.
Comprehensive Experiments
The study conducts multiple experiments with varying conditions, providing a robust comparison between human and LLM performance.
Practical Implications
The findings have significant implications for understanding the capabilities and limitations of LLMs in real-world applications.
Demerits
Limited Generalizability
The study's findings may not be generalizable to all types of persuasion tasks or real-world scenarios.
Human Role-Playing Bias
In Experiment 2, the human target role-playing the bot may introduce biases that affect the results.
Ethical Considerations
The study raises ethical concerns about the potential influence of LLMs on human beliefs and behavior, which are not fully explored.
Expert Commentary
The article presents a rigorous and well-designed study that significantly advances our understanding of the theory of mind abilities of large language models. The novel task design and comprehensive experiments provide valuable insights into the capabilities and limitations of LLMs in persuasion tasks. The study's findings are particularly relevant in the context of the growing use of AI in various applications that involve human interaction. However, the study also raises important ethical considerations that need to be addressed. The caution against attributing human-like ToM to LLMs is well-founded, and the potential influence of LLMs on human beliefs and behavior underscores the need for careful regulation and ethical guidelines. Overall, the study makes a significant contribution to the field and provides a foundation for future research.
Recommendations
- ✓ Further research should explore the generalizability of the findings to different types of persuasion tasks and real-world scenarios.
- ✓ Ethical guidelines and policies should be developed to regulate the use of LLMs in applications that involve human interaction and decision-making.