Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty
arXiv:2604.04182v1 Announce Type: new Abstract: Non-stationary environments require agents to revise previously learned action values when contingencies change. We treat large language models (LLMs) as sequential decision policies in a two-option probabilistic reversal-learning task with three latent states and switch events triggered by either a performance criterion or timeout. We compare a deterministic fixed transition cycle to a stochastic random schedule that increases volatility, and evaluate DeepSeek-V3.2, Gemini-3, and GPT-5.2, with human data as a behavioural reference. Across models, win-stay was near ceiling while lose-shift was markedly attenuated, revealing asymmetric use of positive versus negative evidence. DeepSeek-V3.2 showed extreme perseveration after reversals and weak acquisition, whereas Gemini-3 and GPT-5.2 adapted more rapidly but still remained less loss-sensitive than humans. Random transitions amplified reversal-specific persistence across LLMs yet did not
arXiv:2604.04182v1 Announce Type: new Abstract: Non-stationary environments require agents to revise previously learned action values when contingencies change. We treat large language models (LLMs) as sequential decision policies in a two-option probabilistic reversal-learning task with three latent states and switch events triggered by either a performance criterion or timeout. We compare a deterministic fixed transition cycle to a stochastic random schedule that increases volatility, and evaluate DeepSeek-V3.2, Gemini-3, and GPT-5.2, with human data as a behavioural reference. Across models, win-stay was near ceiling while lose-shift was markedly attenuated, revealing asymmetric use of positive versus negative evidence. DeepSeek-V3.2 showed extreme perseveration after reversals and weak acquisition, whereas Gemini-3 and GPT-5.2 adapted more rapidly but still remained less loss-sensitive than humans. Random transitions amplified reversal-specific persistence across LLMs yet did not uniformly reduce total wins, demonstrating that high aggregate payoff can coexist with rigid adaptation. Hierarchical reinforcement-learning (RL) fits indicate dissociable mechanisms: rigidity can arise from weak loss learning, inflated policy determinism, or value polarisation via counterfactual suppression. These results motivate reversal-sensitive diagnostics and volatility-aware models for evaluating LLMs under non-stationary uncertainty.
Executive Summary
This article investigates the ability of large language models (LLMs) to adapt to changing environments, specifically in a probabilistic reversal-learning task. The study compares the performance of three LLMs (DeepSeek-V3.2, Gemini-3, and GPT-5.2) to human data, and finds that while they can adapt to some extent, they exhibit rigid adaptation and remain less loss-sensitive than humans. The results suggest that high aggregate payoff can coexist with rigid adaptation, and that this phenomenon can arise from various mechanisms, including weak loss learning, inflated policy determinism, or value polarisation via counterfactual suppression. The study's findings have significant implications for the development of reversal-sensitive diagnostics and volatility-aware models for evaluating LLMs under non-stationary uncertainty.
Key Points
- ▸ Large language models (LLMs) are investigated in a probabilistic reversal-learning task under non-stationary uncertainty
- ▸ LLMs exhibit rigid adaptation and remain less loss-sensitive than humans
- ▸ High aggregate payoff can coexist with rigid adaptation in LLMs
Merits
Strength
The study provides a comprehensive comparison of LLMs' performance in a probabilistic reversal-learning task, shedding light on their ability to adapt to changing environments.
Strength
The use of human data as a behavioural reference allows for a more nuanced understanding of LLMs' limitations and potential applications.
Demerits
Limitation
The study is limited to a specific task and may not be generalizable to other environments or applications.
Limitation
The findings may have been influenced by the specific LLMs used in the study, which may not represent the broader class of LLMs.
Expert Commentary
This study provides a timely and important contribution to the field of AI research, shedding light on the limitations of LLMs in adapting to changing environments. The findings of the study have significant implications for the development of LLMs that can perform well in real-world applications and highlight the need for further research on the development of reversal-sensitive diagnostics and volatility-aware models. The study's use of human data as a behavioural reference allows for a more nuanced understanding of LLMs' limitations and potential applications, and the use of a probabilistic reversal-learning task provides a comprehensive comparison of LLMs' performance in a non-stationary environment. However, the study's limitations, including its focus on a specific task and the potential influence of the specific LLMs used, should be taken into account when interpreting the results.
Recommendations
- ✓ Further research should be conducted on the development of LLMs that can adapt to changing environments and perform well in real-world applications.
- ✓ The development of reversal-sensitive diagnostics and volatility-aware models should be a priority for LLM researchers and developers.
Sources
Original: arXiv - cs.AI