Compiled Memory: Not More Information, but More Precise Instructions for Language Agents
arXiv:2603.15666v1 Announce Type: new Abstract: Existing memory systems for language agents address memory management: how to retrieve and page more information within a context budget. We address a complementary problem -- memory utility: what experience is worth keeping, and how it should change agent behavior. We present Atlas, a memory kernel that compiles accumulated task experience into an agent's instruction structure -- without fine-tuning, RAG, or human intervention. Memory is distillation, not storage; delivery is instruction rewriting, not context injection. Facts extracted from agent failures and successes are verified through a three-step promotion gate and delivered by rewriting the agent's system prompt with learned sub-bullets. On CUAD contract analysis, the evolved prompt improves GPT-4o token-level F1 by $+8.7$pp and precision by $+12.5$pp. On HotpotQA multi-hop QA, joint F1 improves $+3.16$pp. An ablation isolates the mechanism's defining property -- the training si
arXiv:2603.15666v1 Announce Type: new Abstract: Existing memory systems for language agents address memory management: how to retrieve and page more information within a context budget. We address a complementary problem -- memory utility: what experience is worth keeping, and how it should change agent behavior. We present Atlas, a memory kernel that compiles accumulated task experience into an agent's instruction structure -- without fine-tuning, RAG, or human intervention. Memory is distillation, not storage; delivery is instruction rewriting, not context injection. Facts extracted from agent failures and successes are verified through a three-step promotion gate and delivered by rewriting the agent's system prompt with learned sub-bullets. On CUAD contract analysis, the evolved prompt improves GPT-4o token-level F1 by $+8.7$pp and precision by $+12.5$pp. On HotpotQA multi-hop QA, joint F1 improves $+3.16$pp. An ablation isolates the mechanism's defining property -- the training signal constraint: the evolved prompt learns exactly what it is taught, and nothing more. Applied to Claude Sonnet~4.5 using the same evolved prompt -- compiled from GPT-4o errors, unchanged -- joint F1 improves $+2.31$pp, with gains concentrating where Claude's stronger baseline leaves the most room -- confirming that the compiled knowledge is task-shaped, not model-shaped.
Executive Summary
The article presents Atlas, a memory kernel that compiles accumulated task experience into an agent's instruction structure without human intervention. Atlas improves language agent performance by $+8.7$pp and $+12.5$pp in token-level F1 and precision, respectively, on the CUAD contract analysis task. The proposed mechanism is task-shaped, not model-shaped, as evidenced by its effectiveness on Claude Sonnet 4.5 using the same evolved prompt. Atlas addresses the problem of memory utility, focusing on what experience is worth keeping and how it should change agent behavior. The article highlights the potential of Atlas in improving language agent performance, especially in tasks that require precise instructions.
Key Points
- ▸ Atlas, a memory kernel, compiles accumulated task experience into an agent's instruction structure
- ▸ Atlas improves language agent performance without human intervention or fine-tuning
- ▸ The proposed mechanism is task-shaped, not model-shaped, demonstrating its effectiveness on different language models
Merits
Strength in Task-Shaped Learning
The Atlas mechanism learns from task-specific experiences and adapts to the particular requirements of each task, leading to improved performance.
Efficient Memory Management
Atlas distills accumulated experience into an agent's instruction structure, reducing the need for large memory storage and improving memory efficiency.
Demerits
Limited Generalizability
The effectiveness of Atlas is demonstrated on specific tasks and language models, and its generalizability to other domains and tasks remains to be explored.
Dependence on High-Quality Training Data
The quality and accuracy of the training data used to compile the agent's instruction structure are crucial for the success of Atlas, and poor data quality may lead to suboptimal performance.
Expert Commentary
The article presents a novel and promising approach to improving language agent performance through the compilation of accumulated task experience into an agent's instruction structure. The Atlas mechanism addresses the problem of memory utility, focusing on what experience is worth keeping and how it should change agent behavior. While the article demonstrates the effectiveness of Atlas on specific tasks and language models, its generalizability to other domains and tasks remains to be explored. Additionally, the dependence of Atlas on high-quality training data is a significant consideration. Nonetheless, the potential of Atlas in improving language agent performance, especially in tasks that require precise instructions, is substantial.
Recommendations
- ✓ Further research is needed to explore the generalizability of Atlas to other domains and tasks
- ✓ High-quality training data is essential for the success of Atlas, and methods for ensuring data accuracy and reliability should be developed