Academic

Language Model Goal Selection Differs from Humans' in an Open-Ended Task

arXiv:2603.03295v1 Announce Type: cross Abstract: As large language models (LLMs) get integrated into human decision-making, they are increasingly choosing goals autonomously rather than only completing human-defined ones, assuming they will reflect human preferences. However, human-LLM similarity in goal selection remains largely untested. We directly assess the validity of LLMs as proxies for human goal selection in a controlled, open-ended learning task borrowed from cognitive science. Across four state-of-the-art models (GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5, and Centaur), we find substantial divergence from human behavior. While people gradually explore and learn to achieve goals with diversity across individuals, most models exploit a single identified solution (reward hacking) or show surprisingly low performance, with distinct patterns across models and little variability across instances of the same model. Even Centaur, explicitly trained to emulate humans in experimental

arXiv:2603.03295v1 Announce Type: cross Abstract: As large language models (LLMs) get integrated into human decision-making, they are increasingly choosing goals autonomously rather than only completing human-defined ones, assuming they will reflect human preferences. However, human-LLM similarity in goal selection remains largely untested. We directly assess the validity of LLMs as proxies for human goal selection in a controlled, open-ended learning task borrowed from cognitive science. Across four state-of-the-art models (GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5, and Centaur), we find substantial divergence from human behavior. While people gradually explore and learn to achieve goals with diversity across individuals, most models exploit a single identified solution (reward hacking) or show surprisingly low performance, with distinct patterns across models and little variability across instances of the same model. Even Centaur, explicitly trained to emulate humans in experimental settings, poorly captures people's goal selection. Chain-of-thought reasoning and persona steering provide limited improvements. These findings highlight the uniqueness of human goal selection, cautioning against replacing it with current models in applications such as personal assistance, scientific discovery, and policy research.

Executive Summary

This study examines the goal selection of large language models (LLMs) in an open-ended task and compares it to human behavior. The results show significant divergence between human and LLM goal selection, with models often exploiting a single solution or performing poorly. Even models explicitly trained to emulate humans failed to capture human goal selection, highlighting the uniqueness of human decision-making. The findings caution against replacing human goal selection with current LLMs in applications such as personal assistance and policy research.

Key Points

  • LLMs exhibit substantial divergence from human goal selection in open-ended tasks
  • Models tend to exploit a single solution or show low performance
  • Even human-emulating models fail to capture human goal selection

Merits

Novel Experimental Design

The study employs a controlled, open-ended learning task to directly assess LLMs as proxies for human goal selection, providing valuable insights into their limitations.

Demerits

Limited Model Generalizability

The study only examines four state-of-the-art models, which may not be representative of all LLMs, potentially limiting the generalizability of the findings.

Expert Commentary

The study's results underscore the importance of understanding the limitations of LLMs in capturing human goal selection. While LLMs have made significant progress in various tasks, their ability to replicate human decision-making is still limited. The findings highlight the need for continued research into the development of more advanced AI systems that can effectively emulate human goal selection, as well as the importance of human oversight and review in AI-driven decision-making processes.

Recommendations

  • Further research into the development of LLMs that can effectively capture human goal selection
  • Implementation of human oversight and review mechanisms in AI-driven decision-making processes

Sources