Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning
arXiv:2603.08999v1 Announce Type: new Abstract: Large language models (LLMs) achieve strong reasoning performance through chain-of-thought (CoT) reasoning, yet often generate unnecessarily long reasoning paths that incur high inference cost. Recent self-consistency-based approaches further improve accuracy but require sampling and aggregating multiple reasoning trajectories, leading to substantial additional computational overhead. This paper introduces a confidence-aware decision framework that analyzes a single completed reasoning trajectory to adaptively select between single-path and multi-path reasoning. The framework is trained using sentence-level numeric and linguistic features extracted from intermediate reasoning states in the MedQA dataset and generalizes effectively to MathQA, MedMCQA, and MMLU without additional fine-tuning. Experimental results show that the proposed method maintains accuracy comparable to multi-path baselines while using up to 80\% fewer tokens. These f
arXiv:2603.08999v1 Announce Type: new Abstract: Large language models (LLMs) achieve strong reasoning performance through chain-of-thought (CoT) reasoning, yet often generate unnecessarily long reasoning paths that incur high inference cost. Recent self-consistency-based approaches further improve accuracy but require sampling and aggregating multiple reasoning trajectories, leading to substantial additional computational overhead. This paper introduces a confidence-aware decision framework that analyzes a single completed reasoning trajectory to adaptively select between single-path and multi-path reasoning. The framework is trained using sentence-level numeric and linguistic features extracted from intermediate reasoning states in the MedQA dataset and generalizes effectively to MathQA, MedMCQA, and MMLU without additional fine-tuning. Experimental results show that the proposed method maintains accuracy comparable to multi-path baselines while using up to 80\% fewer tokens. These findings demonstrate that reasoning trajectories contain rich signals for uncertainty estimation, enabling a simple, transferable mechanism to balance accuracy and efficiency in LLM reasoning.
Executive Summary
This study introduces a confidence-aware self-consistency framework for efficient large language model (LLM) chain-of-thought reasoning. The proposed method analyzes a single completed reasoning trajectory to adaptively select between single-path and multi-path reasoning, achieving comparable accuracy to multi-path baselines while reducing computational overhead by up to 80%. The framework is trained using sentence-level numeric and linguistic features extracted from intermediate reasoning states in the MedQA dataset and generalizes effectively to other datasets without additional fine-tuning. This approach has significant implications for the development of more efficient and accurate LLM-based reasoning systems.
Key Points
- ▸ The proposed confidence-aware self-consistency framework improves the efficiency of LLM chain-of-thought reasoning.
- ▸ The framework achieves comparable accuracy to multi-path baselines while reducing computational overhead by up to 80%.
- ▸ The method generalizes effectively to other datasets without additional fine-tuning.
Merits
Strength in Efficiency
The proposed framework significantly reduces computational overhead by up to 80%, making it a more efficient approach to LLM chain-of-thought reasoning.
Improved Generalizability
The framework generalizes effectively to other datasets without additional fine-tuning, demonstrating its transferability and adaptability.
Comparable Accuracy
The proposed method achieves comparable accuracy to multi-path baselines, demonstrating its effectiveness in maintaining high reasoning performance.
Demerits
Limited Dataset Scope
The study primarily focuses on the MedQA dataset, and its generalizability to other datasets may be limited by the scope of the training data.
Dependence on Intermediate States
The framework relies on extracted features from intermediate reasoning states, which may not be applicable to all LLM-based reasoning systems.
Expert Commentary
The proposed confidence-aware self-consistency framework represents a significant advancement in the development of efficient and accurate LLM-based reasoning systems. By adaptively selecting between single-path and multi-path reasoning, the framework achieves comparable accuracy to multi-path baselines while reducing computational overhead by up to 80%. The study's findings on transferability and adaptability have important implications for the development of more robust and efficient transfer learning approaches in LLMs. However, the framework's dependence on intermediate states and limited dataset scope may limit its applicability and generalizability. Nevertheless, this study provides valuable insights into the development of more efficient and accurate LLM-based reasoning systems, and its findings have significant implications for both practical and policy considerations.
Recommendations
- ✓ Researchers should investigate the extension of the proposed framework to other LLM-based reasoning systems and explore its applicability to different datasets and domains.
- ✓ Developers of LLM-based reasoning systems should consider incorporating confidence-aware self-consistency mechanisms to improve the efficiency and accuracy of their systems.