Academic

GenePlan: Evolving Better Generalized PDDL Plans using Large Language Models

arXiv:2603.09481v1 Announce Type: new Abstract: We present GenePlan (GENeralized Evolutionary Planner), a novel framework that leverages large language model (LLM) assisted evolutionary algorithms to generate domain-dependent generalized planners for classical planning tasks described in PDDL. By casting generalized planning as an optimization problem, GenePlan iteratively evolves interpretable Python planners that minimize plan length across diverse problem instances. In empirical evaluation across six existing benchmark domains and two new domains, GenePlan achieved an average SAT score of 0.91, closely matching the performance of the state-of-the-art planners (SAT score 0.93), and significantly outperforming other LLM-based baselines such as chain-of-thought (CoT) prompting (average SAT score 0.64). The generated planners solve new instances rapidly (average 0.49 seconds per task) and at low cost (average $1.82 per domain using GPT-4o).

Andrew Murray, Danial Dervovic, Alberto Pozanco, Michael Cashmore · March 11, 2026 · 1 min read · 28 views

#cs.AI

Executive Summary

The authors present GenePlan, a novel framework utilizing large language models (LLMs) and evolutionary algorithms to generate domain-dependent generalized planners for classical planning tasks described in PDDL. The framework achieves competitive results with state-of-the-art planners and significantly outperforms LLM-based baselines. The generated planners exhibit rapid solution times and low computational costs. However, the study's scope is limited to a specific set of domains, and the applicability of GenePlan to more complex planning tasks remains uncertain. Further research is needed to fully assess the potential of GenePlan.

Key Points

▸ GenePlan leverages LLMs and evolutionary algorithms to generate domain-dependent generalized planners.
▸ The framework achieves competitive results with state-of-the-art planners.
▸ GenePlan significantly outperforms LLM-based baselines in terms of plan length and computational costs.

Merits

Competitive Performance

GenePlan achieves average SAT scores comparable to state-of-the-art planners, indicating its effectiveness in generating high-quality planners.

Rapid Solution Times

The generated planners exhibit rapid solution times, averaging 0.49 seconds per task, making them suitable for real-world applications.

Low Computational Costs

GenePlan's use of LLMs and evolutionary algorithms results in low computational costs, averaging $1.82 per domain using GPT-4o.

Demerits

Limited Domain Scope

The study's scope is limited to a specific set of domains, and the applicability of GenePlan to more complex planning tasks remains uncertain.

Dependence on LLMs

GenePlan's performance relies heavily on the quality of the LLMs used, which may introduce dependencies on specific models or training datasets.

Expert Commentary

The GenePlan framework demonstrates the potential of combining LLMs and evolutionary algorithms to generate high-quality planners for classical planning tasks. However, the study's limitations, such as the dependence on LLMs and the limited domain scope, highlight the need for further research to fully assess the potential of GenePlan. The framework's ability to generate interpretable planners may also provide insights into the explainability of AI-generated planners, which is a critical area of research. Overall, GenePlan is a promising development in the field of planning and scheduling, and its implications may be far-reaching in various industries and domains.

Recommendations

✓ Future research should focus on expanding the scope of GenePlan to more complex planning tasks and evaluating its performance in real-world applications.
✓ The authors should investigate the use of different LLMs and training datasets to assess the robustness and generalizability of GenePlan.

Sources

arXiv - cs.AI

GenePlan: Evolving Better Generalized PDDL Plans using Large Language Models

AI Commentary

Executive Summary

Key Points

Merits

Competitive Performance

Rapid Solution Times

Low Computational Costs

Demerits

Limited Domain Scope

Dependence on LLMs

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs