Academic

Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty

Haomiaomiao Wang, Tom\'as E Ward, Lili Zhang · April 7, 2026 · 1 min read · 2 views

#cs.AI

arXiv:2604.04182v1 Announce Type: new Abstract: Non-stationary environments require agents to revise previously learned action values when contingencies change. We treat large language models (LLMs) as sequential decision policies in a two-option probabilistic reversal-learning task with three latent states and switch events triggered by either a performance criterion or timeout. We compare a deterministic fixed transition cycle to a stochastic random schedule that increases volatility, and evaluate DeepSeek-V3.2, Gemini-3, and GPT-5.2, with human data as a behavioural reference. Across models, win-stay was near ceiling while lose-shift was markedly attenuated, revealing asymmetric use of positive versus negative evidence. DeepSeek-V3.2 showed extreme perseveration after reversals and weak acquisition, whereas Gemini-3 and GPT-5.2 adapted more rapidly but still remained less loss-sensitive than humans. Random transitions amplified reversal-specific persistence across LLMs yet did not uniformly reduce total wins, demonstrating that high aggregate payoff can coexist with rigid adaptation. Hierarchical reinforcement-learning (RL) fits indicate dissociable mechanisms: rigidity can arise from weak loss learning, inflated policy determinism, or value polarisation via counterfactual suppression. These results motivate reversal-sensitive diagnostics and volatility-aware models for evaluating LLMs under non-stationary uncertainty.

Executive Summary

This article investigates the ability of large language models (LLMs) to adapt to changing environments, specifically in a probabilistic reversal-learning task. The study compares the performance of three LLMs (DeepSeek-V3.2, Gemini-3, and GPT-5.2) to human data, and finds that while they can adapt to some extent, they exhibit rigid adaptation and remain less loss-sensitive than humans. The results suggest that high aggregate payoff can coexist with rigid adaptation, and that this phenomenon can arise from various mechanisms, including weak loss learning, inflated policy determinism, or value polarisation via counterfactual suppression. The study's findings have significant implications for the development of reversal-sensitive diagnostics and volatility-aware models for evaluating LLMs under non-stationary uncertainty.

Key Points

▸ Large language models (LLMs) are investigated in a probabilistic reversal-learning task under non-stationary uncertainty
▸ LLMs exhibit rigid adaptation and remain less loss-sensitive than humans
▸ High aggregate payoff can coexist with rigid adaptation in LLMs

Merits

Strength

The study provides a comprehensive comparison of LLMs' performance in a probabilistic reversal-learning task, shedding light on their ability to adapt to changing environments.

Strength

The use of human data as a behavioural reference allows for a more nuanced understanding of LLMs' limitations and potential applications.

Demerits

Limitation

The study is limited to a specific task and may not be generalizable to other environments or applications.

Limitation

The findings may have been influenced by the specific LLMs used in the study, which may not represent the broader class of LLMs.

Expert Commentary

This study provides a timely and important contribution to the field of AI research, shedding light on the limitations of LLMs in adapting to changing environments. The findings of the study have significant implications for the development of LLMs that can perform well in real-world applications and highlight the need for further research on the development of reversal-sensitive diagnostics and volatility-aware models. The study's use of human data as a behavioural reference allows for a more nuanced understanding of LLMs' limitations and potential applications, and the use of a probabilistic reversal-learning task provides a comprehensive comparison of LLMs' performance in a non-stationary environment. However, the study's limitations, including its focus on a specific task and the potential influence of the specific LLMs used, should be taken into account when interpreting the results.

Recommendations

✓ Further research should be conducted on the development of LLMs that can adapt to changing environments and perform well in real-world applications.
✓ The development of reversal-sensitive diagnostics and volatility-aware models should be a priority for LLM researchers and developers.

Sources

Original: arXiv - cs.AI

arXiv - cs.AI

Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty

AI Commentary

Executive Summary

Key Points

Merits

Strength

Strength

Demerits

Limitation

Limitation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs