Academic

Do LLMs Benefit From Their Own Words?

Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, Jacob Andreas · March 3, 2026 · 1 min read · 29 views

#cs.CL #cs.AI

arXiv:2602.24287v1 Announce Type: new Abstract: Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we revisit this design choice by asking whether large language models benefit from conditioning on their own prior responses. Using in-the-wild, multi-turn conversations, we compare standard (full-context) prompting with a user-turn-only prompting approach that omits all previous assistant responses, across three open reasoning models and one state-of-the-art model. To our surprise, we find that removing prior assistant responses does not affect response quality on a large fraction of turns. Omitting assistant-side history can reduce cumulative context lengths by up to 10x. To explain this result, we find that multi-turn conversations consist of a substantial proportion (36.4%) of self-contained prompts, and that many follow-up prompts provide sufficient instruction to be answered using only the current user turn and prior user turns. When analyzing cases where user-turn-only prompting substantially outperforms full context, we identify instances of context pollution, in which models over-condition on their previous responses, introducing errors, hallucinations, or stylistic artifacts that propagate across turns. Motivated by these findings, we design a context-filtering approach that selectively omits assistant-side context. Our findings suggest that selectively omitting assistant history can improve response quality while reducing memory consumption.

Executive Summary

This article presents a thought-provoking analysis of large language models (LLMs) and their interaction design. Researchers investigated whether LLMs benefit from conditioning on their own prior responses during multi-turn conversations. To their surprise, they found that omitting prior assistant responses does not affect response quality on a large fraction of turns. The study identified 'self-contained prompts' and 'context pollution' as key factors influencing the outcome. The authors propose a context-filtering approach to selectively omit assistant-side context, which can improve response quality and reduce memory consumption. This research contributes to the ongoing debate on LLM design and has significant implications for the development of more efficient and effective language models.

Key Points

▸ The study challenges the conventional wisdom that LLMs benefit from conditioning on their own prior responses.
▸ Omitting prior assistant responses does not affect response quality on a large fraction of turns.
▸ The authors identified 'self-contained prompts' and 'context pollution' as key factors influencing the outcome.

Merits

Strength of methodology

The study employs a rigorous experimental design, using in-the-wild, multi-turn conversations to test the hypothesis.

Insightful analysis

The authors provide a nuanced understanding of the dynamics of multi-turn conversations and the factors influencing LLM performance.

Demerits

Limited dataset scope

The study focuses on a specific set of LLMs and conversations, which may limit the generalizability of the findings.

Need for further validation

The proposed context-filtering approach requires further validation and testing to ensure its effectiveness in real-world applications.

Expert Commentary

This study is a significant contribution to the field of natural language processing and has important implications for the development of more efficient and effective language models. The authors' findings challenge the conventional wisdom on LLM design and provide a nuanced understanding of the dynamics of multi-turn conversations. The proposed context-filtering approach is a promising solution to the problem of context pollution and has the potential to improve response quality and reduce memory consumption. However, further validation and testing are needed to ensure its effectiveness in real-world applications. The study's findings and implications will be of interest to researchers, developers, and policymakers working on LLMs and AI adoption.

Recommendations

✓ Future studies should investigate the generalizability of the findings across different LLMs and conversation types.
✓ The proposed context-filtering approach should be further validated and tested in real-world applications to ensure its effectiveness.

Sources

arXiv - cs.CL

Do LLMs Benefit From Their Own Words?

AI Commentary

Executive Summary

Key Points

Merits

Strength of methodology

Insightful analysis

Demerits

Limited dataset scope

Need for further validation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs