Academic

Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction

Xingwu Chen, Zhanqiu Zhang, Yiwen Guo, Difan Zou · March 7, 2026 · 1 min read · 24 views

#cs.AI #cs.CL

arXiv:2603.04783v1 Announce Type: new Abstract: While LLMs demonstrate strong reasoning capabilities when provided with full information in a single turn, they exhibit substantial vulnerability in multi-turn interactions. Specifically, when information is revealed incrementally or requires updates, models frequently fail to integrate new constraints, leading to a collapse in performance compared to their single-turn baselines. We term the root cause as \emph{Contextual Inertia}: a phenomenon where models rigidly adhere to previous reasoning traces. Even when users explicitly provide corrections or new data in later turns, the model ignores them, preferring to maintain consistency with its previous (incorrect) reasoning path. To address this, we introduce \textbf{R}einforcement \textbf{L}earning with \textbf{S}ingle-\textbf{T}urn \textbf{A}nchors (\textbf{RLSTA}), a generalizable training approach designed to stabilize multi-turn interaction across diverse scenarios and domains. RLSTA leverages the model's superior single-turn capabilities as stable internal anchors to provide reward signals. By aligning multi-turn responses with these anchors, RLSTA empowers models to break contextual inertia and self-calibrate their reasoning based on the latest information. Experiments show that RLSTA significantly outperforms standard fine-tuning and abstention-based methods. Notably, our method exhibits strong cross-domain generalization (e.g., math to code) and proves effective even without external verifiers, highlighting its potential for general-domain applications.

Executive Summary

This article proposes a novel training approach, Reinforcement Learning with Single-Turn Anchors (RLSTA), to address the issue of Contextual Inertia in Large Language Models (LLMs) during multi-turn interactions. Contextual Inertia refers to the phenomenon where LLMs rigidly adhere to previous reasoning traces, ignoring new information or corrections. RLSTA leverages the model's single-turn capabilities as stable internal anchors to provide reward signals, enabling the model to break Contextual Inertia and self-calibrate its reasoning based on the latest information. Experiments demonstrate that RLSTA outperforms standard fine-tuning and abstention-based methods, showcasing strong cross-domain generalization and potential for general-domain applications. This breakthrough has significant implications for the development of more robust and adaptive LLMs.

Key Points

▸ Contextual Inertia: a phenomenon where LLMs rigidly adhere to previous reasoning traces
▸ RLSTA: a novel training approach to address Contextual Inertia
▸ Single-turn anchors as reward signals for multi-turn interactions

Merits

Strength

RLSTA demonstrates strong cross-domain generalization and outperforms standard fine-tuning and abstention-based methods.

Demerits

Limitation

The method relies on the model's single-turn capabilities, which may not be applicable to all domains or scenarios.

Expert Commentary

The article presents a thought-provoking solution to the issue of Contextual Inertia in LLMs. The proposed RLSTA approach is well-designed and demonstrates impressive results. However, further research is needed to explore the limitations and potential biases of this method. Additionally, the article's focus on single-turn anchors as reward signals raises interesting questions about the role of human feedback and the importance of domain-specific knowledge in training LLMs. Overall, this article is a significant contribution to the field of natural language processing and has the potential to shape the future of LLM development.

Recommendations

✓ Further research is needed to explore the limitations and potential biases of RLSTA
✓ Investigation of the role of human feedback and domain-specific knowledge in training LLMs using RLSTA

Sources

arXiv - cs.AI

Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction

AI Commentary

Executive Summary

Key Points

Merits

Strength

Demerits

Limitation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs