Academic

MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games

Yunfei Xie, Kevin Wang, Bobby Cheng, Jianzhu Yao, Zhizhou Sha, Alexander Duffy, Yihan Xi, Hongyuan Mei, Cheston Tan, Chen Wei, Pramod Viswanath, Zhangyang Wang · March 11, 2026 · 1 min read · 31 views

#cs.AI

arXiv:2603.09022v1 Announce Type: new Abstract: Multi-turn, multi-agent LLM game evaluations often exhibit substantial run-to-run variance. In long-horizon interactions, small early deviations compound across turns and are amplified by multi-agent coupling. This biases win rate estimates and makes rankings unreliable across repeated tournaments. Prompt choice worsens this further by producing different effective policies. We address both instability and underperformance with MEMO (Memory-augmented MOdel context optimization), a self-play framework that optimizes inference-time context by coupling retention and exploration. Retention maintains a persistent memory bank that stores structured insights from self-play trajectories and injects them as priors during later play. Exploration runs tournament-style prompt evolution with uncertainty-aware selection via TrueSkill, and uses prioritized replay to revisit rare and decisive states. Across five text-based games, MEMO raises mean win rate from 25.1% to 49.5% for GPT-4o-mini and from 20.9% to 44.3% for Qwen-2.5-7B-Instruct, using $2,000$ self-play games per task. Run-to-run variance also drops, giving more stable rankings across prompt variations. These results suggest that multi-agent LLM game performance and robustness have substantial room for improvement through context optimization. MEMO achieves the largest gains in negotiation and imperfect-information games, while RL remains more effective in perfect-information settings.

Executive Summary

This article proposes MEMO (Memory-augmented Model context optimization), a self-play framework that addresses instability and underperformance in multi-turn, multi-agent LLM game evaluations. MEMO optimizes inference-time context by coupling retention and exploration, which significantly improves win rates and reduces run-to-run variance. The approach achieves substantial gains in negotiation and imperfect-information games, while reinforcement learning remains more effective in perfect-information settings. The results suggest that multi-agent LLM game performance and robustness have substantial room for improvement through context optimization. MEMO's ability to adapt to dynamic environments and learn from self-play makes it a promising technique for improving the reliability of LLM game evaluations.

Key Points

▸ MEMO addresses instability and underperformance in multi-turn, multi-agent LLM game evaluations
▸ The framework optimizes inference-time context through retention and exploration
▸ MEMO achieves significant gains in negotiation and imperfect-information games

Merits

Strength

Addressing instability and underperformance in multi-agent LLM game evaluations is a significant contribution to the field.

Improves Win Rates

MEMO's approach leads to substantial improvements in win rates, making it a reliable technique for evaluating LLM game performance.

Adaptability

MEMO's ability to adapt to dynamic environments and learn from self-play makes it a promising technique for improving the reliability of LLM game evaluations.

Demerits

Dependence on Self-Play

MEMO's performance relies heavily on self-play data, which can be time-consuming and resource-intensive to generate.

Limited Transferability

MEMO's gains may not be transferable to other LLM game settings or objectives, which could limit its applicability in practice.

Perfect-Information Settings

MEMO's effectiveness in perfect-information settings is not as significant as in imperfect-information settings, which may limit its use in certain applications.

Expert Commentary

The proposed MEMO framework is a significant contribution to the field of multi-agent LLM game evaluations. However, its dependence on self-play data and limited transferability to other settings may limit its applicability in practice. Furthermore, the results of this study highlight the importance of considering the complexity of decision-making scenarios in LLM game evaluations. Ultimately, the use of LLM games in decision-making processes can provide valuable insights into complex decision-making scenarios, making it a promising area of research for policymakers and practitioners alike.

Recommendations

✓ Future research should focus on exploring the transferability of MEMO to other LLM game settings and objectives.
✓ Researchers should consider developing more efficient methods for generating self-play data to reduce the computational burden of MEMO.

Sources

arXiv - cs.AI

MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games

AI Commentary

Executive Summary

Key Points

Merits

Strength

Improves Win Rates

Adaptability

Demerits

Dependence on Self-Play

Limited Transferability

Perfect-Information Settings

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs