DPBench: Large Language Models Struggle with Simultaneous Coordination
arXiv:2602.13255v1 Announce Type: new Abstract: Large language models are increasingly deployed in multi-agent systems, yet we lack benchmarks that test whether they can coordinate under resource contention. We introduce DPBench, a benchmark based on the Dining Philosophers problem that evaluates LLM coordination across eight conditions that vary decision timing, group size, and communication. Our experiments with GPT-5.2, Claude Opus 4.5, and Grok 4.1 reveal a striking asymmetry: LLMs coordinate effectively in sequential settings but fail when decisions must be made simultaneously, with deadlock rates exceeding 95\% under some conditions. We trace this failure to convergent reasoning, where agents independently arrive at identical strategies that, when executed simultaneously, guarantee deadlock. Contrary to expectations, enabling communication does not resolve this problem and can even increase deadlock rates. Our findings suggest that multi-agent LLM systems requiring concurrent re
arXiv:2602.13255v1 Announce Type: new Abstract: Large language models are increasingly deployed in multi-agent systems, yet we lack benchmarks that test whether they can coordinate under resource contention. We introduce DPBench, a benchmark based on the Dining Philosophers problem that evaluates LLM coordination across eight conditions that vary decision timing, group size, and communication. Our experiments with GPT-5.2, Claude Opus 4.5, and Grok 4.1 reveal a striking asymmetry: LLMs coordinate effectively in sequential settings but fail when decisions must be made simultaneously, with deadlock rates exceeding 95\% under some conditions. We trace this failure to convergent reasoning, where agents independently arrive at identical strategies that, when executed simultaneously, guarantee deadlock. Contrary to expectations, enabling communication does not resolve this problem and can even increase deadlock rates. Our findings suggest that multi-agent LLM systems requiring concurrent resource access may need external coordination mechanisms rather than relying on emergent coordination. DPBench is released as an open-source benchmark. Code and benchmark are available at https://github.com/najmulhasan-code/dpbench.
Executive Summary
The article 'DPBench: Large Language Models Struggle with Simultaneous Coordination' introduces a novel benchmark, DPBench, designed to evaluate the coordination capabilities of large language models (LLMs) in multi-agent systems under resource contention. The study focuses on the Dining Philosophers problem and tests LLMs across various conditions involving decision timing, group size, and communication. The findings reveal a significant asymmetry in LLM performance, where models excel in sequential settings but fail dramatically in simultaneous decision-making scenarios, with deadlock rates exceeding 95% under certain conditions. The study attributes this failure to convergent reasoning, where agents independently adopt identical strategies that lead to deadlock. Surprisingly, enabling communication does not mitigate this issue and can exacerbate it. The authors conclude that multi-agent LLM systems requiring concurrent resource access may need external coordination mechanisms rather than relying on emergent coordination. DPBench is released as an open-source benchmark to facilitate further research.
Key Points
- ▸ Introduction of DPBench, a benchmark for evaluating LLM coordination in multi-agent systems.
- ▸ LLMs perform well in sequential settings but fail in simultaneous decision-making scenarios.
- ▸ Deadlock rates exceed 95% under certain conditions due to convergent reasoning.
- ▸ Communication does not resolve the coordination problem and can increase deadlock rates.
- ▸ Recommendation for external coordination mechanisms in multi-agent LLM systems.
Merits
Novel Benchmark
DPBench provides a valuable tool for assessing LLM coordination in multi-agent systems, addressing a critical gap in current benchmarks.
Comprehensive Analysis
The study thoroughly examines various conditions affecting LLM coordination, offering insights into the limitations of current models.
Practical Implications
The findings have direct implications for the deployment of LLMs in real-world multi-agent systems, highlighting the need for external coordination mechanisms.
Demerits
Limited Scope
The study focuses solely on the Dining Philosophers problem, which may not fully capture the complexity of all real-world coordination scenarios.
Model Selection
The analysis is limited to specific versions of GPT, Claude, and Grok, which may not represent the full spectrum of LLM capabilities.
Communication Analysis
The study's findings on communication's role in coordination are counterintuitive and may require further investigation to fully understand the underlying mechanisms.
Expert Commentary
The article 'DPBench: Large Language Models Struggle with Simultaneous Coordination' presents a rigorous and insightful analysis of LLM coordination in multi-agent systems. The introduction of DPBench is a significant contribution to the field, providing a much-needed benchmark for evaluating LLM performance under resource contention. The study's findings are both surprising and concerning, revealing a stark asymmetry in LLM coordination capabilities. The high deadlock rates observed in simultaneous decision-making scenarios underscore the limitations of current models and the need for external coordination mechanisms. The study's attribution of these failures to convergent reasoning offers a valuable explanation for the observed phenomena. However, the counterintuitive role of communication in exacerbating deadlock rates warrants further investigation. The practical and policy implications of this research are substantial, highlighting the importance of developing robust coordination strategies and benchmarks for LLM deployment in multi-agent systems. Overall, the article provides a balanced and objective analysis, offering valuable insights that advance our understanding of LLM coordination and its challenges.
Recommendations
- ✓ Further research should explore the role of communication in LLM coordination to better understand its impact on deadlock rates.
- ✓ Developers should consider integrating external coordination mechanisms into multi-agent LLM systems to prevent deadlocks and ensure efficient resource access.
- ✓ Future benchmarks should expand beyond the Dining Philosophers problem to capture a broader range of coordination scenarios and challenges.