Skip to main content
Academic

State Design Matters: How Representations Shape Dynamic Reasoning in Large Language Models

arXiv:2602.15858v1 Announce Type: cross Abstract: As large language models (LLMs) move from static reasoning tasks toward dynamic environments, their success depends on the ability to navigate and respond to an environment that changes as they interact at inference time. An underexplored factor in these settings is the representation of the state. Holding model parameters fixed, we systematically vary three key aspects: (1) state granularity (long form versus summary), (2) structure (natural language versus symbolic), and (3) spatial grounding (text-only versus images or textual map encodings) across sequential decision-making benchmarks. We find that trajectory summarisation improves performance by reducing noise and stabilising long-horizon reasoning. Second, natural language representations are the most robust across models, whereas structured encodings help mainly for models with strong code or structured output priors, such as JSON schemas. Third, while image-inputs show some ben

arXiv:2602.15858v1 Announce Type: cross Abstract: As large language models (LLMs) move from static reasoning tasks toward dynamic environments, their success depends on the ability to navigate and respond to an environment that changes as they interact at inference time. An underexplored factor in these settings is the representation of the state. Holding model parameters fixed, we systematically vary three key aspects: (1) state granularity (long form versus summary), (2) structure (natural language versus symbolic), and (3) spatial grounding (text-only versus images or textual map encodings) across sequential decision-making benchmarks. We find that trajectory summarisation improves performance by reducing noise and stabilising long-horizon reasoning. Second, natural language representations are the most robust across models, whereas structured encodings help mainly for models with strong code or structured output priors, such as JSON schemas. Third, while image-inputs show some benefit, text-based spatial encodings prove most effective. This advantage stems not from the spatial information itself, but from the act of construction, which compels the model to perform the spatial reasoning that static input does not elicit. Overall, we demonstrate that design choices for representing state are a decisive factor in performance, distinct from the availability of information itself. We note, however, that even with improved representations, current LLMs and VLMs remain brittle over long horizons, particularly when they must synthesise information to manage multiple subtasks to reach a goal.

Executive Summary

This article delves into the critical area of state representation in large language models (LLMs) as they transition from static to dynamic reasoning tasks. The study systematically varies three key aspects of state representation: granularity, structure, and spatial grounding. The findings suggest that trajectory summarisation enhances performance by reducing noise and stabilising long-horizon reasoning. Moreover, natural language representations prove more robust, while structured encodings benefit models with strong code or structured output priors. The study also highlights the importance of spatial encodings, which elicit spatial reasoning from the model. However, current LLMs and vision-language models (VLMs) remain brittle over long horizons, particularly when synthesising information for multiple subtasks. The research underscores the significance of state design choices in performance, distinct from the availability of information itself.

Key Points

  • Representations of state play a crucial role in LLMs' performance in dynamic environments.
  • Trajectory summarisation improves performance by reducing noise and stabilising long-horizon reasoning.
  • Natural language representations are more robust, while structured encodings benefit models with strong code or structured output priors.

Merits

Strength in Experimental Design

The study systematically varies three key aspects of state representation, providing a comprehensive understanding of their impact on LLMs' performance.

Insight into State Representation

The research offers valuable insights into the importance of state representation in LLMs, highlighting the benefits of natural language, structured, and spatial encodings.

Demerits

Limited Generalizability

The findings may not be generalizable to all LLMs and VLMs, as the study focuses on specific models and tasks.

Brittleness over Long Horizons

Current LLMs and VLMs remain brittle over long horizons, particularly when synthesising information for multiple subtasks, limiting their practical applications.

Expert Commentary

The article makes a significant contribution to the field of LLMs by shedding light on the critical role of state representation in their performance. The findings have important implications for the design and development of LLMs, particularly for applications requiring dynamic reasoning and decision-making. However, the study's limitations, including the brittleness of current LLMs and VLMs over long horizons, highlight the need for further research in this area. Future studies should explore the development of more robust and reliable LLMs, as well as the integration of explainability and transparency mechanisms to ensure the safe and effective deployment of these models in critical applications.

Recommendations

  • Future research should focus on developing more robust and reliable LLMs, particularly in terms of their ability to synthesise information for multiple subtasks over long horizons.
  • The integration of explainability and transparency mechanisms into LLMs is essential for ensuring their safe and effective deployment in critical applications, such as healthcare and finance.

Sources