K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model
arXiv:2602.19128v1 Announce Type: new Abstract: Optimizing GPU kernels is critical for efficient modern machine learning systems yet remains challenging due to the complex interplay of design factors and rapid hardware evolution. Existing automated approaches typically treat Large Language Models (LLMs) merely as stochastic code generators within heuristic-guided evolutionary loops. These methods often struggle with complex kernels requiring coordinated, multi-step structural transformations, as they lack explicit planning capabilities and frequently discard promising strategies due to inefficient or incorrect intermediate implementations. To address this, we propose Search via Co-Evolving World Model and build K-Search based on this method. By replacing static search heuristics with a co-evolving world model, our framework leverages LLMs' prior domain knowledge to guide the search, actively exploring the optimization space. This approach explicitly decouples high-level algorithmic pl
arXiv:2602.19128v1 Announce Type: new Abstract: Optimizing GPU kernels is critical for efficient modern machine learning systems yet remains challenging due to the complex interplay of design factors and rapid hardware evolution. Existing automated approaches typically treat Large Language Models (LLMs) merely as stochastic code generators within heuristic-guided evolutionary loops. These methods often struggle with complex kernels requiring coordinated, multi-step structural transformations, as they lack explicit planning capabilities and frequently discard promising strategies due to inefficient or incorrect intermediate implementations. To address this, we propose Search via Co-Evolving World Model and build K-Search based on this method. By replacing static search heuristics with a co-evolving world model, our framework leverages LLMs' prior domain knowledge to guide the search, actively exploring the optimization space. This approach explicitly decouples high-level algorithmic planning from low-level program instantiation, enabling the system to navigate non-monotonic optimization paths while remaining resilient to temporary implementation defects. We evaluate K-Search on diverse, complex kernels from FlashInfer, including GQA, MLA, and MoE kernels. Our results show that K-Search significantly outperforms state-of-the-art evolutionary search methods, achieving an average 2.10x improvement and up to a 14.3x gain on complex MoE kernels. On the GPUMode TriMul task, K-Search achieves state-of-the-art performance on H100, reaching 1030us and surpassing both prior evolution and human-designed solutions.
Executive Summary
This article proposes a novel approach, K-Search, to optimize GPU kernels for efficient machine learning systems. K-Search leverages Large Language Models (LLMs) to guide the search process through a co-evolving world model, enabling the system to navigate complex optimization paths and avoid temporary implementation defects. The approach is evaluated on diverse kernels from FlashInfer, achieving significant performance improvements over state-of-the-art evolutionary search methods. The results demonstrate K-Search's potential to surpass human-designed solutions on certain tasks. While the article presents a promising solution to the challenges of GPU kernel optimization, its limitations and potential applications warrant further exploration.
Key Points
- ▸ K-Search leverages LLMs to guide the search process through a co-evolving world model
- ▸ The approach enables the system to navigate complex optimization paths and avoid temporary implementation defects
- ▸ K-Search achieves significant performance improvements over state-of-the-art evolutionary search methods
Merits
Strength in LLM Utilization
K-Search effectively utilizes LLMs to guide the search process, leveraging their prior domain knowledge to navigate complex optimization paths.
Demerits
Limited Generalizability
The approach's performance improvements may be specific to the kernels and tasks evaluated in the article, and its generalizability to other optimization problems remains uncertain.
Expert Commentary
While K-Search presents a promising solution to the challenges of GPU kernel optimization, its limitations and potential applications warrant further exploration. Specifically, the approach's performance improvements may be specific to the kernels and tasks evaluated in the article, and its generalizability to other optimization problems remains uncertain. Nevertheless, the article's results demonstrate the potential of LLMs in optimization tasks, and its implications for the field of machine learning are significant. As the field continues to evolve, it is essential to investigate the potential applications of K-Search and other LLM-based approaches to optimization problems.
Recommendations
- ✓ Further evaluation of K-Search on diverse optimization problems to assess its generalizability
- ✓ Investigation of the potential applications of LLMs in optimization tasks beyond GPU kernel optimization