Skip to main content
Academic

Code World Models for Parameter Control in Evolutionary Algorithms

arXiv:2602.22260v1 Announce Type: new Abstract: Can an LLM learn how an optimizer behaves -- and use that knowledge to control it? We extend Code World Models (CWMs), LLM-synthesized Python programs that predict environment dynamics, from deterministic games to stochastic combinatorial optimization. Given suboptimal trajectories of $(1{+}1)$-$\text{RLS}_k$, the LLM synthesizes a simulator of the optimizer's dynamics; greedy planning over this simulator then selects the mutation strength $k$ at each step. On \lo{} and \onemax{}, CWM-greedy performs within 6\% of the theoretically optimal policy -- without ever seeing optimal-policy trajectories. On \jump{$_k$}, where a deceptive valley causes all adaptive baselines to fail (0\% success rate), CWM-greedy achieves 100\% success rate -- without any collection policy using oracle knowledge of the gap parameter. On the NK-Landscape, where no closed-form model exists, CWM-greedy outperforms all baselines across fifteen independently generate

C
Camilo Chac\'on Sartori, Guillem Rodr\'iguez Corominas
· · 1 min read · 13 views

arXiv:2602.22260v1 Announce Type: new Abstract: Can an LLM learn how an optimizer behaves -- and use that knowledge to control it? We extend Code World Models (CWMs), LLM-synthesized Python programs that predict environment dynamics, from deterministic games to stochastic combinatorial optimization. Given suboptimal trajectories of $(1{+}1)$-$\text{RLS}_k$, the LLM synthesizes a simulator of the optimizer's dynamics; greedy planning over this simulator then selects the mutation strength $k$ at each step. On \lo{} and \onemax{}, CWM-greedy performs within 6\% of the theoretically optimal policy -- without ever seeing optimal-policy trajectories. On \jump{$_k$}, where a deceptive valley causes all adaptive baselines to fail (0\% success rate), CWM-greedy achieves 100\% success rate -- without any collection policy using oracle knowledge of the gap parameter. On the NK-Landscape, where no closed-form model exists, CWM-greedy outperforms all baselines across fifteen independently generated instances ($36.94$ vs.\ $36.32$; $p<0.001$) when the prompt includes empirical transition statistics. The CWM also outperforms DQN in sample efficiency (200 offline trajectories vs.\ 500 online episodes), success rate (100\% vs.\ 58\%), and generalization ($k{=}3$: 78\% vs.\ 0\%). Robustness experiments confirm stable synthesis across 5 independent runs.

Executive Summary

This study demonstrates the potential of Code World Models (CWMs) in parameter control for evolutionary algorithms. By leveraging Large Language Models (LLMs) to synthesize simulators of optimizer dynamics, CWMs enable greedy planning to select optimal mutation strengths. The results show remarkable performance on various benchmarks, including achieving 100% success rate on the jump$_k$ problem and outperforming DQN in sample efficiency and success rate on the NK-Landscape. This breakthrough has significant implications for both practical and policy applications in optimization and machine learning.

Key Points

  • CWMs can learn to predict optimizer behavior and control it using LLM-synthesized simulators.
  • The CWM-greedy approach achieves state-of-the-art performance on various benchmarks, including jump$_k$ and NK-Landscape.
  • CWMs outperform DQN in sample efficiency, success rate, and generalization on the NK-Landscape.

Merits

Strength in Optimization

The study demonstrates the CWM's ability to optimize complex problems, such as the NK-Landscape, with impressive results. The use of LLM-synthesized simulators enables the CWM to learn and adapt to the optimizer's dynamics, leading to improved performance.

Robustness and Reproducibility

The study ensures robustness and reproducibility by conducting multiple runs and reporting stable synthesis of the CWM. This provides confidence in the CWM's performance and encourages further research.

Demerits

Limited Domain Generalization

The study focuses on specific problem domains, such as jump$_k$ and NK-Landscape, and it is unclear whether the CWM's performance will generalize to other domains. Further research is needed to explore the CWM's applicability in broader contexts.

Expert Commentary

The study's findings are remarkable and demonstrate the potential of CWMs in parameter control for evolutionary algorithms. However, further research is needed to explore the CWM's applicability in broader contexts and to address the limitations of the study. The results have significant implications for both practical and policy applications and highlight the need for further development and exploration of the CWM.

Recommendations

  • Future research should focus on exploring the CWM's applicability in broader contexts and addressing the limitations of the study.
  • The CWM has significant potential for practical applications in optimization and machine learning, and further development and exploration are warranted.

Sources