Academic

Evaluation of Deontic Conditional Reasoning in Large Language Models: The Case of Wason's Selection Task

arXiv:2603.06416v1 Announce Type: new Abstract: As large language models (LLMs) advance in linguistic competence, their reasoning abilities are gaining increasing attention. In humans, reasoning often performs well in domain specific settings, particularly in normative rather than purely formal contexts. Although prior studies have compared LLM and human reasoning, the domain specificity of LLM reasoning remains underexplored. In this study, we introduce a new Wason Selection Task dataset that explicitly encodes deontic modality to systematically distinguish deontic from descriptive conditionals, and use it to examine LLMs' conditional reasoning under deontic rules. We further analyze whether observed error patterns are better explained by confirmation bias (a tendency to seek rule-supporting evidence) or by matching bias (a tendency to ignore negation and select items that lexically match elements of the rule). Results show that, like humans, LLMs reason better with deontic rules and

Hirohiko Abe, Kentaro Ozeki, Risako Ando, Takanobu Morishita, Koji Mineshima, Mitsuhiro Okada · March 9, 2026 · 1 min read · 15 views

#cs.CL

Executive Summary

This study evaluates the deontic conditional reasoning abilities of large language models (LLMs) using a modified Wason Selection Task dataset. The results show that LLMs perform better with deontic rules and exhibit error patterns similar to humans, specifically matching bias. While the study provides valuable insights into LLM reasoning, the findings are largely observational, and further research is needed to fully understand the implications. The study's use of a novel dataset and systematic analysis of LLM performance is a significant contribution to the field. However, the study's scope is limited, and the results may not generalize to all LLMs or real-world applications.

Key Points

▸ LLMs exhibit better deontic conditional reasoning with deontic rules
▸ LLMs display matching bias-like errors in conditional reasoning
▸ The study introduces a novel Wason Selection Task dataset with deontic modality

Merits

Strength in Design

The study's use of a novel dataset with deontic modality allows for a systematic evaluation of LLM performance and error patterns, providing valuable insights into LLM reasoning.

Insights into LLM Reasoning

The study's findings that LLMs reason better with deontic rules and exhibit matching bias-like errors contribute to our understanding of LLM reasoning and its limitations.

Demerits

Limited Generalizability

The study's results may not generalize to all LLMs or real-world applications due to the limited scope of the study.

Observational Nature

The study's findings are largely observational, and further research is needed to fully understand the implications of the results.

Expert Commentary

This study is a significant contribution to the field of artificial intelligence and cognitive science, as it provides valuable insights into LLM reasoning and its limitations. The study's use of a novel dataset and systematic analysis of LLM performance is a major strength. However, the study's scope is limited, and further research is needed to fully understand the implications of the results. The study's findings have practical implications for the development of more human-like AI systems and policy implications for the development of AI systems that can interact with humans in more effective and natural ways.

Recommendations

✓ Future research should focus on developing more effective methods for evaluating LLM reasoning and its limitations.
✓ Researchers should explore the development of LLMs that can reason better with deontic rules by incorporating explicit representation of deontic modality.

Sources

arXiv - cs.CL

Evaluation of Deontic Conditional Reasoning in Large Language Models: The Case of Wason's Selection Task

AI Commentary

Executive Summary

Key Points

Merits

Strength in Design

Insights into LLM Reasoning

Demerits

Limited Generalizability

Observational Nature

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs