Academic

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

arXiv:2602.16953v1 Announce Type: new Abstract: Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical. High-coverage hardware verification exemplifies this challenge due to its reliance on industrial simulators and non-differentiable execution signals. We propose LLM4Cov, an offline agent-learning framework that models verification as memoryless state transitions guided by deterministic evaluators. Building on this formulation, we introduce execution-validated data curation, policy-aware agentic data synthesis, and worst-state-prioritized sampling to enable scalable learning under execution constraints. We further curate a reality-aligned benchmark adapted from an existing verification suite through a revised evaluation protocol. Using the proposed pipeline, a compact 4B-parameter model achieves 69.2% coverage pass rate under agentic evaluatio

Hejia Zhang, Zhongming Yu, Chia-Tung Ho, Haoxing Ren, Brucek Khailany, Jishen Zhao · February 22, 2026 · 1 min read · 3 views

#cs.AI #cs.LG

Executive Summary

The article proposes LLM4Cov, an offline agent-learning framework for high-coverage testbench generation in hardware verification. By modeling verification as memoryless state transitions and introducing execution-validated data curation, policy-aware agentic data synthesis, and worst-state-prioritized sampling, LLM4Cov enables scalable learning under execution constraints. The proposed pipeline achieves a 69.2% coverage pass rate using a compact 4B-parameter model, outperforming its teacher by 5.3% and demonstrating competitive performance against larger models. This breakthrough has significant implications for the field of hardware verification, as it addresses the challenge of expensive and slow tool feedback through offline learning. However, the article's reliance on industrial simulators and non-differentiable execution signals raises questions about its generalizability and real-world applicability.

Key Points

▸ LLM4Cov is an offline agent-learning framework for high-coverage testbench generation in hardware verification.
▸ The framework models verification as memoryless state transitions and introduces novel data curation and sampling techniques.
▸ LLM4Cov achieves competitive performance against larger models and outperforms its teacher by 5.3%.
▸ The framework relies on industrial simulators and non-differentiable execution signals, raising questions about its generalizability.

Merits

Strength in Scalability

LLM4Cov's offline learning approach enables scalable learning under execution constraints, addressing a significant challenge in hardware verification.

Improved Performance

The proposed pipeline achieves competitive performance against larger models and outperforms its teacher by 5.3%, demonstrating the effectiveness of LLM4Cov.

Real-world Relevance

The article's focus on high-coverage testbench generation and execution-aware learning makes it relevant to real-world hardware verification challenges.

Demerits

Limitation in Generalizability

The article's reliance on industrial simulators and non-differentiable execution signals raises questions about the generalizability of LLM4Cov to diverse hardware verification settings.

Dependence on Simulators

LLM4Cov's performance is highly dependent on industrial simulators, which may not be available or reliable in all scenarios.

Lack of Real-world Evaluation

The article's evaluation is based on a revised evaluation protocol and a reality-aligned benchmark, but it lacks real-world evaluation to demonstrate its effectiveness in practical settings.

Expert Commentary

The article represents a significant breakthrough in the field of hardware verification, as it addresses the challenge of expensive and slow tool feedback through offline learning. However, the article's reliance on industrial simulators and non-differentiable execution signals raises questions about its generalizability and real-world applicability. To further improve the article's contributions, the authors should consider evaluating their framework in real-world settings and exploring its applicability to diverse hardware verification scenarios.

Recommendations

✓ Future research should focus on evaluating LLM4Cov in real-world settings to demonstrate its effectiveness in practical scenarios.
✓ The authors should explore the applicability of LLM4Cov to diverse hardware verification scenarios, such as formal verification or equivalence checking, to improve its scalability and performance.

Sources

arXiv - cs.AI

Something extraordinary is coming.

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

AI Commentary

Executive Summary

Key Points

Merits

Strength in Scalability

Improved Performance

Real-world Relevance

Demerits

Limitation in Generalizability

Dependence on Simulators

Lack of Real-world Evaluation

Expert Commentary

Recommendations

Sources

Related Articles

Humans and LLMs Diverge on Probabilistic Inferences

France or Spain or Germany or France: A Neural Account …

Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of …

JCG, PC

HSOLLC Co., Ltd.