Academic

K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model

Shiyi Cao, Ziming Mao, Joseph E. Gonzalez, Ion Stoica · March 7, 2026 · 1 min read · 28 views

#cs.AI

arXiv:2602.19128v1 Announce Type: new Abstract: Optimizing GPU kernels is critical for efficient modern machine learning systems yet remains challenging due to the complex interplay of design factors and rapid hardware evolution. Existing automated approaches typically treat Large Language Models (LLMs) merely as stochastic code generators within heuristic-guided evolutionary loops. These methods often struggle with complex kernels requiring coordinated, multi-step structural transformations, as they lack explicit planning capabilities and frequently discard promising strategies due to inefficient or incorrect intermediate implementations. To address this, we propose Search via Co-Evolving World Model and build K-Search based on this method. By replacing static search heuristics with a co-evolving world model, our framework leverages LLMs' prior domain knowledge to guide the search, actively exploring the optimization space. This approach explicitly decouples high-level algorithmic planning from low-level program instantiation, enabling the system to navigate non-monotonic optimization paths while remaining resilient to temporary implementation defects. We evaluate K-Search on diverse, complex kernels from FlashInfer, including GQA, MLA, and MoE kernels. Our results show that K-Search significantly outperforms state-of-the-art evolutionary search methods, achieving an average 2.10x improvement and up to a 14.3x gain on complex MoE kernels. On the GPUMode TriMul task, K-Search achieves state-of-the-art performance on H100, reaching 1030us and surpassing both prior evolution and human-designed solutions.

Executive Summary

This article proposes a novel approach, K-Search, to optimize GPU kernels for efficient machine learning systems. K-Search leverages Large Language Models (LLMs) to guide the search process through a co-evolving world model, enabling the system to navigate complex optimization paths and avoid temporary implementation defects. The approach is evaluated on diverse kernels from FlashInfer, achieving significant performance improvements over state-of-the-art evolutionary search methods. The results demonstrate K-Search's potential to surpass human-designed solutions on certain tasks. While the article presents a promising solution to the challenges of GPU kernel optimization, its limitations and potential applications warrant further exploration.

Key Points

▸ K-Search leverages LLMs to guide the search process through a co-evolving world model
▸ The approach enables the system to navigate complex optimization paths and avoid temporary implementation defects
▸ K-Search achieves significant performance improvements over state-of-the-art evolutionary search methods

Merits

Strength in LLM Utilization

K-Search effectively utilizes LLMs to guide the search process, leveraging their prior domain knowledge to navigate complex optimization paths.

Demerits

Limited Generalizability

The approach's performance improvements may be specific to the kernels and tasks evaluated in the article, and its generalizability to other optimization problems remains uncertain.

Expert Commentary

While K-Search presents a promising solution to the challenges of GPU kernel optimization, its limitations and potential applications warrant further exploration. Specifically, the approach's performance improvements may be specific to the kernels and tasks evaluated in the article, and its generalizability to other optimization problems remains uncertain. Nevertheless, the article's results demonstrate the potential of LLMs in optimization tasks, and its implications for the field of machine learning are significant. As the field continues to evolve, it is essential to investigate the potential applications of K-Search and other LLM-based approaches to optimization problems.

Recommendations

✓ Further evaluation of K-Search on diverse optimization problems to assess its generalizability
✓ Investigation of the potential applications of LLMs in optimization tasks beyond GPU kernel optimization

Sources

arXiv - cs.AI

K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model

AI Commentary

Executive Summary

Key Points

Merits

Strength in LLM Utilization

Demerits

Limited Generalizability

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs