Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents
arXiv:2602.12662v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as autonomous agents for multi-turn decision-making tasks. However, current agents typically rely on fixed cognitive patterns: non-thinking models generate immediate responses, while thinking models engage in deep reasoning uniformly. This rigidity is inefficient for long-horizon tasks, where cognitive demands vary significantly from step to step, with some requiring strategic planning and others only routine execution. In this paper, we introduce CogRouter, a framework that trains agents to dynamically adapt cognitive depth at each step. Grounded in ACT-R theory, we design four hierarchical cognitive levels ranging from instinctive responses to strategic planning. Our two-stage training approach includes Cognition-aware Supervised Fine-tuning (CoSFT) to instill stable level-specific patterns, and Cognition-aware Policy Optimization (CoPO) for step-level credit assignment via confide
arXiv:2602.12662v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as autonomous agents for multi-turn decision-making tasks. However, current agents typically rely on fixed cognitive patterns: non-thinking models generate immediate responses, while thinking models engage in deep reasoning uniformly. This rigidity is inefficient for long-horizon tasks, where cognitive demands vary significantly from step to step, with some requiring strategic planning and others only routine execution. In this paper, we introduce CogRouter, a framework that trains agents to dynamically adapt cognitive depth at each step. Grounded in ACT-R theory, we design four hierarchical cognitive levels ranging from instinctive responses to strategic planning. Our two-stage training approach includes Cognition-aware Supervised Fine-tuning (CoSFT) to instill stable level-specific patterns, and Cognition-aware Policy Optimization (CoPO) for step-level credit assignment via confidence-aware advantage reweighting. The key insight is that appropriate cognitive depth should maximize the confidence of the resulting action. Experiments on ALFWorld and ScienceWorld demonstrate that CogRouter achieves state-of-the-art performance with superior efficiency. With Qwen2.5-7B, it reaches an 82.3% success rate, outperforming GPT-4o (+40.3%), OpenAI-o3 (+18.3%), and GRPO (+14.0%), while using 62% fewer tokens.
Executive Summary
The article 'Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents' introduces CogRouter, a novel framework designed to enhance the efficiency and performance of large language models (LLMs) as autonomous agents. The framework dynamically adapts the cognitive depth of LLMs at each decision-making step, drawing from ACT-R theory to establish four hierarchical cognitive levels. The two-stage training approach, involving Cognition-aware Supervised Fine-tuning (CoSFT) and Cognition-aware Policy Optimization (CoPO), aims to optimize the cognitive depth to maximize action confidence. Experiments on ALFWorld and ScienceWorld demonstrate significant improvements in success rates and token efficiency, outperforming other models like GPT-4o and OpenAI-o3.
Key Points
- ▸ Introduction of CogRouter framework for dynamic cognitive depth adaptation in LLMs.
- ▸ Four hierarchical cognitive levels based on ACT-R theory.
- ▸ Two-stage training approach: CoSFT and CoPO.
- ▸ Superior performance and efficiency in experiments compared to existing models.
- ▸ Achieves 82.3% success rate with Qwen2.5-7B, using 62% fewer tokens.
Merits
Innovative Framework
CogRouter represents a significant advancement in the field of autonomous agents by introducing a dynamic cognitive depth adaptation mechanism. This innovation addresses the rigidity of current models, which either rely on immediate responses or uniform deep reasoning.
Theoretical Foundation
The framework is grounded in ACT-R theory, providing a robust theoretical basis for the hierarchical cognitive levels. This theoretical grounding enhances the credibility and potential applicability of the model.
Empirical Success
The experimental results demonstrate significant improvements in success rates and token efficiency, outperforming state-of-the-art models. This empirical success validates the effectiveness of the CogRouter framework.
Demerits
Complexity
The two-stage training approach, while effective, adds complexity to the model. This complexity may pose challenges in terms of computational resources and implementation, potentially limiting its accessibility.
Generalizability
The experiments are conducted on specific environments (ALFWorld and ScienceWorld), which may not fully represent the diverse range of real-world applications. Further research is needed to assess the generalizability of the framework.
Token Efficiency
While the framework achieves significant token efficiency, the reduction in token usage may impact the depth of reasoning in certain contexts, potentially affecting the quality of decision-making.
Expert Commentary
The article presents a compelling advancement in the field of autonomous AI agents, addressing a critical gap in current models' rigidity. By introducing CogRouter, the authors demonstrate a sophisticated understanding of cognitive processes and their application in AI systems. The two-stage training approach, grounded in ACT-R theory, provides a robust method for dynamic cognitive adaptation, which is essential for long-horizon tasks. The empirical results are impressive, showcasing significant improvements in success rates and token efficiency. However, the complexity of the framework and the need for further validation in diverse real-world scenarios are notable limitations. The practical implications of this research are substantial, offering potential benefits for industries relying on AI agents. Additionally, the policy implications highlight the importance of balancing innovation with resource efficiency, which is crucial for sustainable AI development. Overall, this article makes a significant contribution to the field and sets a new benchmark for future research in autonomous AI agents.
Recommendations
- ✓ Further research should focus on validating the CogRouter framework in a broader range of real-world applications to assess its generalizability and robustness.
- ✓ Efforts should be made to simplify the training process and reduce computational complexity to enhance the framework's accessibility and scalability.