Skip to main content
Academic

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

arXiv:2602.16953v1 Announce Type: new Abstract: Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical. High-coverage hardware verification exemplifies this challenge due to its reliance on industrial simulators and non-differentiable execution signals. We propose LLM4Cov, an offline agent-learning framework that models verification as memoryless state transitions guided by deterministic evaluators. Building on this formulation, we introduce execution-validated data curation, policy-aware agentic data synthesis, and worst-state-prioritized sampling to enable scalable learning under execution constraints. We further curate a reality-aligned benchmark adapted from an existing verification suite through a revised evaluation protocol. Using the proposed pipeline, a compact 4B-parameter model achieves 69.2% coverage pass rate under agentic evaluatio

arXiv:2602.16953v1 Announce Type: new Abstract: Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical. High-coverage hardware verification exemplifies this challenge due to its reliance on industrial simulators and non-differentiable execution signals. We propose LLM4Cov, an offline agent-learning framework that models verification as memoryless state transitions guided by deterministic evaluators. Building on this formulation, we introduce execution-validated data curation, policy-aware agentic data synthesis, and worst-state-prioritized sampling to enable scalable learning under execution constraints. We further curate a reality-aligned benchmark adapted from an existing verification suite through a revised evaluation protocol. Using the proposed pipeline, a compact 4B-parameter model achieves 69.2% coverage pass rate under agentic evaluation, outperforming its teacher by 5.3% and demonstrating competitive performance against models an order of magnitude larger.

Executive Summary

The article proposes LLM4Cov, an offline agent-learning framework for high-coverage testbench generation in hardware verification. By modeling verification as memoryless state transitions and introducing execution-validated data curation, policy-aware agentic data synthesis, and worst-state-prioritized sampling, LLM4Cov enables scalable learning under execution constraints. The proposed pipeline achieves a 69.2% coverage pass rate using a compact 4B-parameter model, outperforming its teacher by 5.3% and demonstrating competitive performance against larger models. This breakthrough has significant implications for the field of hardware verification, as it addresses the challenge of expensive and slow tool feedback through offline learning. However, the article's reliance on industrial simulators and non-differentiable execution signals raises questions about its generalizability and real-world applicability.

Key Points

  • LLM4Cov is an offline agent-learning framework for high-coverage testbench generation in hardware verification.
  • The framework models verification as memoryless state transitions and introduces novel data curation and sampling techniques.
  • LLM4Cov achieves competitive performance against larger models and outperforms its teacher by 5.3%.
  • The framework relies on industrial simulators and non-differentiable execution signals, raising questions about its generalizability.

Merits

Strength in Scalability

LLM4Cov's offline learning approach enables scalable learning under execution constraints, addressing a significant challenge in hardware verification.

Improved Performance

The proposed pipeline achieves competitive performance against larger models and outperforms its teacher by 5.3%, demonstrating the effectiveness of LLM4Cov.

Real-world Relevance

The article's focus on high-coverage testbench generation and execution-aware learning makes it relevant to real-world hardware verification challenges.

Demerits

Limitation in Generalizability

The article's reliance on industrial simulators and non-differentiable execution signals raises questions about the generalizability of LLM4Cov to diverse hardware verification settings.

Dependence on Simulators

LLM4Cov's performance is highly dependent on industrial simulators, which may not be available or reliable in all scenarios.

Lack of Real-world Evaluation

The article's evaluation is based on a revised evaluation protocol and a reality-aligned benchmark, but it lacks real-world evaluation to demonstrate its effectiveness in practical settings.

Expert Commentary

The article represents a significant breakthrough in the field of hardware verification, as it addresses the challenge of expensive and slow tool feedback through offline learning. However, the article's reliance on industrial simulators and non-differentiable execution signals raises questions about its generalizability and real-world applicability. To further improve the article's contributions, the authors should consider evaluating their framework in real-world settings and exploring its applicability to diverse hardware verification scenarios.

Recommendations

  • Future research should focus on evaluating LLM4Cov in real-world settings to demonstrate its effectiveness in practical scenarios.
  • The authors should explore the applicability of LLM4Cov to diverse hardware verification scenarios, such as formal verification or equivalence checking, to improve its scalability and performance.

Sources