Academic

Language Model Planners do not Scale, but do Formalizers?

arXiv:2603.23844v1 Announce Type: new Abstract: Recent work shows overwhelming evidence that LLMs, even those trained to scale their reasoning trace, perform unsatisfactorily when solving planning problems too complex. Whether the same conclusion holds for LLM formalizers that generate solver-oriented programs remains unknown. We systematically show that LLM formalizers greatly out-scale LLM planners, some retaining perfect accuracy in the classic BlocksWorld domain with a huge state space of size up to $10^{165}$. While performance of smaller LLM formalizers degrades with problem complexity, we show that a divide-and-conquer formalizing technique can greatly improve its robustness. Finally, we introduce unraveling problems where one line of problem description realistically corresponds to exponentially many lines of formal language such as the Planning Domain Definition Language (PDDL), greatly challenging LLM formalizers. We tackle this challenge by introducing a new paradigm, namel

O
Owen Jiang, Cassie Huang, Ashish Sabharwal, Li Zhang
· · 1 min read · 7 views

arXiv:2603.23844v1 Announce Type: new Abstract: Recent work shows overwhelming evidence that LLMs, even those trained to scale their reasoning trace, perform unsatisfactorily when solving planning problems too complex. Whether the same conclusion holds for LLM formalizers that generate solver-oriented programs remains unknown. We systematically show that LLM formalizers greatly out-scale LLM planners, some retaining perfect accuracy in the classic BlocksWorld domain with a huge state space of size up to $10^{165}$. While performance of smaller LLM formalizers degrades with problem complexity, we show that a divide-and-conquer formalizing technique can greatly improve its robustness. Finally, we introduce unraveling problems where one line of problem description realistically corresponds to exponentially many lines of formal language such as the Planning Domain Definition Language (PDDL), greatly challenging LLM formalizers. We tackle this challenge by introducing a new paradigm, namely LLM-as-higher-order-formalizer, where an LLM generates a program generator. This decouples token output from the combinatorial explosion of the underlying formalization and search space.

Executive Summary

This article challenges the conventional wisdom that large language models (LLMs) are ineffective at solving complex planning problems. Through a systematic comparison, the authors demonstrate that LLM formalizers, which generate solver-oriented programs, outperform LLM planners in scaling to complex problems. Specifically, they show that some LLM formalizers retain perfect accuracy in the classic BlocksWorld domain with a huge state space, while a divide-and-conquer technique can improve the robustness of smaller LLM formalizers. However, the authors also introduce unraveling problems, which pose a significant challenge to LLM formalizers. To address this challenge, they propose a new paradigm, LLM-as-higher-order-formalizer, which decouples token output from the combinatorial explosion of the underlying formalization and search space. This work has significant implications for the development of LLMs in planning and automated reasoning applications.

Key Points

  • LLM formalizers outperform LLM planners in scaling to complex planning problems
  • Some LLM formalizers retain perfect accuracy in the BlocksWorld domain with a huge state space
  • Divide-and-conquer technique improves the robustness of smaller LLM formalizers
  • Unraveling problems pose a significant challenge to LLM formalizers
  • LLM-as-higher-order-formalizer paradigm addresses the challenge of unraveling problems

Merits

Strength in addressing a long-standing limitation

The authors provide a systematic comparison of LLM planners and formalizers, shedding light on a previously understudied area. This work has the potential to significantly advance the field of planning and automated reasoning.

Innovative solution to a significant challenge

The introduction of the LLM-as-higher-order-formalizer paradigm offers a promising solution to the challenge of unraveling problems, which has the potential to greatly expand the applicability of LLMs in planning and automated reasoning applications.

Demerits

Limited scope of the comparison

The comparison is limited to a specific set of LLM formalizers and planners, which may not be representative of the broader range of models and techniques available.

Unraveling problems may not be representative of real-world challenges

The authors introduce unraveling problems as a challenge to LLM formalizers, but it remains unclear whether these problems are representative of the types of challenges that LLMs will encounter in real-world applications.

Expert Commentary

This article represents a significant advance in the field of planning and automated reasoning, and has the potential to greatly expand the applicability of LLMs in these areas. The authors' systematic comparison of LLM planners and formalizers offers new insights into the strengths and limitations of these models, and their introduction of the LLM-as-higher-order-formalizer paradigm offers a promising solution to the challenge of unraveling problems. However, the limited scope of the comparison and the potential for unraveling problems to be representative of real-world challenges are notable limitations of the work. Overall, this article is a valuable contribution to the field, and is likely to have a significant impact on the development of LLMs in planning and automated reasoning applications.

Recommendations

  • Future work should seek to expand the scope of the comparison to include a broader range of LLM formalizers and planners.
  • The authors should seek to validate the relevance of unraveling problems to real-world challenges, and to explore the potential implications of their work for the development of policies and regulations governing the use of LLMs in planning and automated reasoning applications.

Sources

Original: arXiv - cs.CL