Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning
arXiv:2602.18493v1 Announce Type: new Abstract: Long-context LLMs and Retrieval-Augmented Generation (RAG) systems process information passively, deferring state tracking, contradiction resolution, and evidence aggregation to query time, which becomes brittle under ultra long streams with frequent updates. We propose the Unified Memory Agent (UMA), an end-to-end reinforcement learning framework that unifies memory operations and question answering within a single policy. UMA maintains a dual memory representation: a compact core summary for global context and a structured Memory Bank that supports explicit CRUD (create, update, delete, reorganize) over key value entries, enabling proactive consolidation during streaming. To evaluate long-horizon memory behavior, we introduce Ledger-QA, a diagnostic benchmark for continuous state tracking where answers are latent values derived from accumulated updates rather than lo cal span retrieval. Across 13 datasets spanning Ledger-QA, Test-Time
arXiv:2602.18493v1 Announce Type: new Abstract: Long-context LLMs and Retrieval-Augmented Generation (RAG) systems process information passively, deferring state tracking, contradiction resolution, and evidence aggregation to query time, which becomes brittle under ultra long streams with frequent updates. We propose the Unified Memory Agent (UMA), an end-to-end reinforcement learning framework that unifies memory operations and question answering within a single policy. UMA maintains a dual memory representation: a compact core summary for global context and a structured Memory Bank that supports explicit CRUD (create, update, delete, reorganize) over key value entries, enabling proactive consolidation during streaming. To evaluate long-horizon memory behavior, we introduce Ledger-QA, a diagnostic benchmark for continuous state tracking where answers are latent values derived from accumulated updates rather than lo cal span retrieval. Across 13 datasets spanning Ledger-QA, Test-Time Learning, and Accurate Retrieval, UMA substantially outperforms long-context and RAG baselines on dynamic reasoning and learning tasks while remaining competitive on standard retrieval benchmarks, underscoring the importance of learned, end-to-end memory management.
Executive Summary
This article proposes the Unified Memory Agent (UMA), an end-to-end reinforcement learning framework that unifies memory operations and question answering within a single policy. UMA maintains a dual memory representation to support proactive consolidation during streaming, enabling it to outperform long-context and Retrieval-Augmented Generation (RAG) baselines on dynamic reasoning and learning tasks. The framework is evaluated on 13 datasets, including Ledger-QA, Test-Time Learning, and Accurate Retrieval. The results demonstrate the importance of learned, end-to-end memory management in processing ultra-long streams with frequent updates. The UMA framework has the potential to revolutionize memory-intensive tasks in areas such as question answering, data retrieval, and content creation.
Key Points
- ▸ The UMA framework unifies memory operations and question answering within a single policy.
- ▸ UMA maintains a dual memory representation to support proactive consolidation during streaming.
- ▸ UMA outperforms long-context and RAG baselines on dynamic reasoning and learning tasks.
Merits
Strength
The UMA framework's ability to learn and adapt to new information in real-time enables it to outperform traditional long-context and RAG systems in dynamic reasoning and learning tasks.
Demerits
Limitation
The UMA framework requires a significant amount of computational resources and training data, which may be a barrier to adoption in resource-constrained environments.
Expert Commentary
The UMA framework is a significant advancement in the field of memory-augmented neural networks. Its ability to learn and adapt to new information in real-time enables it to outperform traditional long-context and RAG systems in dynamic reasoning and learning tasks. The framework's use of reinforcement learning and dual memory representation are key components of its success. While the framework requires significant computational resources and training data, its potential benefits in memory-intensive tasks make it a worthwhile investment. The UMA framework has the potential to revolutionize areas such as question answering, data retrieval, and content creation, and its implications for policy-making in areas such as healthcare and finance are significant.
Recommendations
- ✓ Further research is needed to improve the efficiency and scalability of the UMA framework.
- ✓ The UMA framework should be applied to a wider range of tasks and domains to fully understand its potential benefits and limitations.