Graph-Based Chain-of-Thought Pruning for Reducing Redundant Reflections in Reasoning LLMs
arXiv:2604.05643v1 Announce Type: new Abstract: Extending CoT through RL has been widely used to enhance the reasoning capabilities of LLMs. However, due to the sparsity of reward signals, it can also induce undesirable thinking patterns such as overthinking, i.e., generating redundant intermediate reasoning content. In this work, we argue that a major source of such redundancy is inefficient reflection, which often manifests in two problematic patterns: Indiscriminate Reflection, where the model performs broad, low-impact checks throughout reasoning, and Repetitive Reflection, where it repeatedly re-verifies an already established conclusion. To address this, we introduce a graph-based CoT optimization framework. Specifically, we convert each linear CoT into a directed acyclic graph (DAG) with explicit dependency edges, and design a dual pruning strategy: branch-level pruning removes weakly contributing reflection branches, while depth-level pruning eliminates late-stage re-verificat
arXiv:2604.05643v1 Announce Type: new Abstract: Extending CoT through RL has been widely used to enhance the reasoning capabilities of LLMs. However, due to the sparsity of reward signals, it can also induce undesirable thinking patterns such as overthinking, i.e., generating redundant intermediate reasoning content. In this work, we argue that a major source of such redundancy is inefficient reflection, which often manifests in two problematic patterns: Indiscriminate Reflection, where the model performs broad, low-impact checks throughout reasoning, and Repetitive Reflection, where it repeatedly re-verifies an already established conclusion. To address this, we introduce a graph-based CoT optimization framework. Specifically, we convert each linear CoT into a directed acyclic graph (DAG) with explicit dependency edges, and design a dual pruning strategy: branch-level pruning removes weakly contributing reflection branches, while depth-level pruning eliminates late-stage re-verification. We distill this behavior via a three-stage pipeline: (1) SFT to initialize the policy on pruned concise traces, (2) DPO to prefer correct but less redundant trajectories, and (3) GRPO with length penalty to jointly optimize answer correctness and efficiency. Experiments show that our approach reduces the average reasoning tokens by 42\% while maintaining or improving accuracy.
Executive Summary
This article presents a novel framework to address the inefficiency of overthinking in reasoning large language models (LLMs) by introducing a graph-based chain-of-thought (CoT) pruning method. The authors identify two primary sources of redundancy—indiscriminate and repetitive reflection—and propose a dual pruning strategy to optimize reasoning traces. Their approach converts linear CoT into a directed acyclic graph (DAG) to explicitly model dependencies and applies branch-level and depth-level pruning to eliminate weak or redundant reasoning steps. Through a three-stage distillation pipeline involving supervised fine-tuning (SFT), direct preference optimization (DPO), and group relative policy optimization (GRPO) with length penalties, the authors demonstrate a 42% reduction in reasoning tokens while maintaining or improving answer accuracy. This work contributes to the broader discourse on efficient reasoning in LLMs, particularly in resource-constrained or latency-sensitive applications.
Key Points
- ▸ Introduces a graph-based CoT optimization framework to mitigate redundant reasoning in LLMs by converting linear CoT into a DAG with explicit dependency edges.
- ▸ Identifies and addresses two inefficiencies in reflection: indiscriminate reflection (broad, low-impact checks) and repetitive reflection (redundant re-verification of conclusions).
- ▸ Proposes a dual pruning strategy—branch-level pruning for weakly contributing branches and depth-level pruning for late-stage re-verification—to streamline reasoning traces.
- ▸ Develops a three-stage distillation pipeline (SFT, DPO, GRPO) to optimize both correctness and efficiency, achieving a 42% reduction in reasoning tokens with no loss in accuracy.
- ▸ Demonstrates the practical applicability of the method across diverse reasoning tasks, highlighting its potential for deployability in real-world scenarios.
Merits
Innovative Methodology
The graph-based approach to CoT pruning is a significant departure from traditional linear optimization methods, offering a more nuanced and structured way to evaluate and prune reasoning paths. This methodological innovation has broad implications for improving the efficiency of reasoning LLMs.
Empirical Robustness
The three-stage distillation pipeline (SFT, DPO, GRPO) is meticulously designed and empirically validated, showing a substantial reduction in reasoning tokens (42%) while maintaining or improving accuracy. This demonstrates the robustness and scalability of the approach.
Addressing Core Challenges
The paper directly tackles a critical inefficiency in LLMs—redundant reflection—by identifying its root causes (indiscriminate and repetitive reflection) and proposing targeted solutions. This addresses a gap in existing research on reasoning optimization.
Practical Applicability
The framework is designed with practical deployment in mind, as evidenced by the focus on reducing token usage without sacrificing accuracy. This makes it particularly relevant for resource-constrained or latency-sensitive applications, such as real-time decision-making systems.
Demerits
Complexity of Implementation
The proposed graph-based framework and three-stage distillation pipeline introduce significant complexity in both training and deployment. This may limit accessibility for researchers or organizations without advanced computational resources or expertise in graph-based methods.
Dependence on Reward Signal Sparsity
The paper assumes that reward signal sparsity is the primary driver of redundant reflection. However, other factors, such as model architecture or training data quality, may also contribute to inefficiencies. This could limit the generalizability of the proposed solution in contexts where sparsity is not the dominant issue.
Limited Generalization to Non-Reasoning Tasks
While the framework is highly effective for reasoning tasks, its applicability to non-reasoning tasks (e.g., generative tasks or tasks requiring creative output) remains untested. This may restrict its broader utility in multimodal or hybrid AI systems.
Potential Over-Pruning Risks
The dual pruning strategy, while effective, carries the risk of over-pruning, where critical reasoning steps are inadvertently removed. This could lead to a decline in performance on complex or nuanced tasks where intermediate steps are essential for accurate reasoning.
Expert Commentary
This paper represents a significant advancement in the quest to optimize reasoning in large language models by introducing a graph-based framework to prune redundant reflections. The authors' identification of indiscriminate and repetitive reflection as key inefficiencies is both insightful and actionable, addressing a longstanding challenge in the field. The dual pruning strategy, combined with a rigorous three-stage distillation pipeline, demonstrates not only technical sophistication but also empirical robustness, as evidenced by the substantial reduction in reasoning tokens without sacrificing accuracy. However, the complexity of the implementation and the potential risks of over-pruning warrant careful consideration. The framework's reliance on reward signal sparsity as the primary driver of redundancy may also limit its applicability in contexts where other factors dominate. That said, the paper's contributions are undeniable, particularly in their potential to enhance the deployability of reasoning LLMs in resource-constrained environments. This work sets a new benchmark for future research in efficient reasoning and graph-based optimization in AI systems.
Recommendations
- ✓ Further research should explore the generalizability of the graph-based pruning framework to non-reasoning tasks, such as creative generation or multimodal reasoning, to assess its broader utility in hybrid AI systems.
- ✓ Developers should invest in tools and frameworks that simplify the implementation of graph-based CoT pruning, making the methodology more accessible to practitioners without advanced computational resources or expertise in graph theory.
- ✓ Future studies should investigate the long-term effects of pruning on model behavior, particularly in high-stakes domains like healthcare or law, to ensure that efficiency gains do not come at the expense of interpretability or ethical considerations.
- ✓ Collaborations between academia, industry, and policymakers are encouraged to develop standardized benchmarks for measuring token efficiency and reasoning quality, ensuring consistent and comparable evaluations across different optimization techniques.
- ✓ Organizations deploying reasoning LLMs should conduct thorough empirical evaluations to determine the optimal balance between pruning intensity and task complexity, mitigating the risk of over-pruning and ensuring robust performance across diverse applications.
Sources
Original: arXiv - cs.CL