Academic

GenePlan: Evolving Better Generalized PDDL Plans using Large Language Models

arXiv:2603.09481v1 Announce Type: new Abstract: We present GenePlan (GENeralized Evolutionary Planner), a novel framework that leverages large language model (LLM) assisted evolutionary algorithms to generate domain-dependent generalized planners for classical planning tasks described in PDDL. By casting generalized planning as an optimization problem, GenePlan iteratively evolves interpretable Python planners that minimize plan length across diverse problem instances. In empirical evaluation across six existing benchmark domains and two new domains, GenePlan achieved an average SAT score of 0.91, closely matching the performance of the state-of-the-art planners (SAT score 0.93), and significantly outperforming other LLM-based baselines such as chain-of-thought (CoT) prompting (average SAT score 0.64). The generated planners solve new instances rapidly (average 0.49 seconds per task) and at low cost (average $1.82 per domain using GPT-4o).

arXiv:2603.09481v1 Announce Type: new Abstract: We present GenePlan (GENeralized Evolutionary Planner), a novel framework that leverages large language model (LLM) assisted evolutionary algorithms to generate domain-dependent generalized planners for classical planning tasks described in PDDL. By casting generalized planning as an optimization problem, GenePlan iteratively evolves interpretable Python planners that minimize plan length across diverse problem instances. In empirical evaluation across six existing benchmark domains and two new domains, GenePlan achieved an average SAT score of 0.91, closely matching the performance of the state-of-the-art planners (SAT score 0.93), and significantly outperforming other LLM-based baselines such as chain-of-thought (CoT) prompting (average SAT score 0.64). The generated planners solve new instances rapidly (average 0.49 seconds per task) and at low cost (average $1.82 per domain using GPT-4o).

Executive Summary

The authors present GenePlan, a novel framework utilizing large language models (LLMs) and evolutionary algorithms to generate domain-dependent generalized planners for classical planning tasks described in PDDL. The framework achieves competitive results with state-of-the-art planners and significantly outperforms LLM-based baselines. The generated planners exhibit rapid solution times and low computational costs. However, the study's scope is limited to a specific set of domains, and the applicability of GenePlan to more complex planning tasks remains uncertain. Further research is needed to fully assess the potential of GenePlan.

Key Points

  • GenePlan leverages LLMs and evolutionary algorithms to generate domain-dependent generalized planners.
  • The framework achieves competitive results with state-of-the-art planners.
  • GenePlan significantly outperforms LLM-based baselines in terms of plan length and computational costs.

Merits

Competitive Performance

GenePlan achieves average SAT scores comparable to state-of-the-art planners, indicating its effectiveness in generating high-quality planners.

Rapid Solution Times

The generated planners exhibit rapid solution times, averaging 0.49 seconds per task, making them suitable for real-world applications.

Low Computational Costs

GenePlan's use of LLMs and evolutionary algorithms results in low computational costs, averaging $1.82 per domain using GPT-4o.

Demerits

Limited Domain Scope

The study's scope is limited to a specific set of domains, and the applicability of GenePlan to more complex planning tasks remains uncertain.

Dependence on LLMs

GenePlan's performance relies heavily on the quality of the LLMs used, which may introduce dependencies on specific models or training datasets.

Expert Commentary

The GenePlan framework demonstrates the potential of combining LLMs and evolutionary algorithms to generate high-quality planners for classical planning tasks. However, the study's limitations, such as the dependence on LLMs and the limited domain scope, highlight the need for further research to fully assess the potential of GenePlan. The framework's ability to generate interpretable planners may also provide insights into the explainability of AI-generated planners, which is a critical area of research. Overall, GenePlan is a promising development in the field of planning and scheduling, and its implications may be far-reaching in various industries and domains.

Recommendations

  • Future research should focus on expanding the scope of GenePlan to more complex planning tasks and evaluating its performance in real-world applications.
  • The authors should investigate the use of different LLMs and training datasets to assess the robustness and generalizability of GenePlan.

Sources