HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents
arXiv:2603.00977v1 Announce Type: new Abstract: Large language model (LLM) agents have recently demonstrated strong capabilities in interactive decision-making, yet they remain fundamentally limited in long-horizon tasks that require structured planning and reliable execution. Existing approaches predominantly rely on flat autoregressive policies, where high-level reasoning and low-level actions are generated within a single token sequence, leading to inefficient exploration and severe error propagation over extended trajectories. In this work, we propose HiMAC, a hierarchical agentic RL framework that explicitly decomposes long-horizon decision-making into macro-level planning and micro-level execution. HiMAC models reasoning as a structured blueprint generation process followed by goal-conditioned action execution, enabling robust long-horizon planning within LLM-based agents. To train this hierarchy efficiently, we introduce a critic-free hierarchical policy optimization paradigm t
arXiv:2603.00977v1 Announce Type: new Abstract: Large language model (LLM) agents have recently demonstrated strong capabilities in interactive decision-making, yet they remain fundamentally limited in long-horizon tasks that require structured planning and reliable execution. Existing approaches predominantly rely on flat autoregressive policies, where high-level reasoning and low-level actions are generated within a single token sequence, leading to inefficient exploration and severe error propagation over extended trajectories. In this work, we propose HiMAC, a hierarchical agentic RL framework that explicitly decomposes long-horizon decision-making into macro-level planning and micro-level execution. HiMAC models reasoning as a structured blueprint generation process followed by goal-conditioned action execution, enabling robust long-horizon planning within LLM-based agents. To train this hierarchy efficiently, we introduce a critic-free hierarchical policy optimization paradigm that extends group-based reinforcement learning to bi-level structures through hierarchical relative advantage estimation. Furthermore, we propose an iterative co-evolution training strategy that alternates between planner exploration and executor adaptation, mitigating the non-stationarity inherent in hierarchical learning. Extensive experiments on ALFWorld, WebShop, and Sokoban demonstrate that HiMAC consistently outperforms strong prompting and reinforcement learning baselines, achieving state-of-the-art performance and substantially improved sample efficiency across both text-based and visually grounded environments. Our results show that introducing structured hierarchy, rather than increasing model scale alone, is a key factor for enabling robust long-horizon agentic intelligence.
Executive Summary
The article introduces HiMAC, a hierarchical agentic RL framework that enables long-horizon decision-making in large language model agents. HiMAC decomposes decision-making into macro-level planning and micro-level execution, allowing for robust planning and execution. The framework is trained using a critic-free hierarchical policy optimization paradigm and an iterative co-evolution training strategy, resulting in state-of-the-art performance and improved sample efficiency across various environments.
Key Points
- ▸ HiMAC is a hierarchical agentic RL framework for long-horizon decision-making
- ▸ The framework decomposes decision-making into macro-level planning and micro-level execution
- ▸ HiMAC uses a critic-free hierarchical policy optimization paradigm and an iterative co-evolution training strategy
Merits
Improved Sample Efficiency
HiMAC achieves substantially improved sample efficiency across both text-based and visually grounded environments.
Robust Long-Horizon Planning
HiMAC enables robust long-horizon planning within LLM-based agents, outperforming strong prompting and reinforcement learning baselines.
Demerits
Complexity
The introduction of a hierarchical structure and the use of a critic-free hierarchical policy optimization paradigm may add complexity to the framework.
Expert Commentary
The introduction of HiMAC represents a significant advancement in the development of LLM agents, enabling robust long-horizon decision-making and improved sample efficiency. The hierarchical structure of HiMAC provides a promising approach to addressing the limitations of flat autoregressive policies, and the critic-free hierarchical policy optimization paradigm offers a novel solution to the challenges of training hierarchical RL frameworks. However, further research is needed to fully explore the potential of HiMAC and to address the complexity and potential limitations of the framework.
Recommendations
- ✓ Further research should be conducted to explore the applicability of HiMAC to various domains and to investigate the potential benefits and limitations of the framework.
- ✓ The development of HiMAC should be accompanied by efforts to improve the explainability and transparency of LLM agents, ensuring that the decision-making process is understandable and trustworthy.