Academic

Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL

Siyang Cai, Cangyuan Li, Yinhe Han, Ying Wang · March 11, 2026 · 1 min read · 14 views

#cs.LG #cs.AI #cs.AR

arXiv:2603.09161v1 Announce Type: new Abstract: Learning effective netlist representations is fundamentally constrained by the scarcity of labeled datasets, as real designs are protected by Intellectual Property (IP) and costly to annotate. Existing work therefore focuses on small-scale circuits with clean labels, limiting scalability to realistic designs. Meanwhile, Large Language Models (LLMs) can generate Register-Transfer-Level (RTL) at scale, but their functional incorrectness has hindered their use in circuit analysis. In this work, we make a key observation: even when LLM-Generated RTL is functionally imperfect, the synthesized netlists still preserve structural patterns that are strongly indicative of the intended functionality. Building on this insight, we propose a cost-effective data augmentation and training framework that systematically exploits imperfect LLM-Generated RTL as training data for netlist representation learning, forming an end-to-end pipeline from automated code generation to downstream tasks. We conduct evaluations on circuit functional understanding tasks, including sub-circuit boundary identification and component classification, across benchmarks of increasing scales, extending the task scope from operator-level to IP-level. The evaluations demonstrate that models trained on our noisy synthetic corpus generalize well to real-world netlists, matching or even surpassing methods trained on scarce high-quality data and effectively breaking the data bottleneck in circuit representation learning.

Executive Summary

This article presents a novel approach to learning netlist representations by leveraging Large Language Models (LLMs) to generate Register-Transfer-Level (RTL) code at scale. Although LLM-generated RTL is often functionally imperfect, the synthesized netlists retain structural patterns indicative of intended functionality. The authors propose a cost-effective data augmentation and training framework that exploits this insight, forming an end-to-end pipeline from automated code generation to downstream tasks. Evaluations on circuit functional understanding tasks demonstrate that models trained on noisy synthetic data generalize well to real-world netlists, matching or surpassing methods trained on scarce high-quality data. This breakthrough has the potential to significantly alleviate the data bottleneck in circuit representation learning, enabling the analysis of larger and more complex designs.

Key Points

▸ LLMs can generate RTL code at scale, but their functional incorrectness has hindered their use in circuit analysis.
▸ Imperfect LLM-generated RTL can still preserve structural patterns indicative of intended functionality.
▸ A cost-effective data augmentation and training framework is proposed to leverage this insight.

Merits

Strength in scalability

The proposed framework can handle large-scale designs that are difficult to annotate and analyze using traditional methods.

Improved generalizability

Models trained on noisy synthetic data can generalize well to real-world netlists, reducing the dependence on scarce high-quality data.

Demerits

Potential for errors

The use of imperfect LLM-generated RTL may introduce errors or biases in the netlist representations, which need to be carefully evaluated and mitigated.

Limited understanding of structural patterns

The article assumes that structural patterns are strongly indicative of intended functionality, but the extent to which this is true remains unclear.

Expert Commentary

The article presents a significant advancement in circuit representation learning, leveraging the potential of LLMs to generate RTL code at scale. While the proposed framework has its limitations, it demonstrates a promising approach to alleviating the data bottleneck in circuit analysis. The results are encouraging, and further research is needed to fully understand the structural patterns indicative of intended functionality and to evaluate the robustness of the proposed framework. The implications of this breakthrough are substantial, and the authors' work has the potential to transform the field of circuit design and analysis.

Recommendations

✓ Further research is needed to evaluate the robustness and generalizability of the proposed framework across different domains and designs.
✓ The authors should investigate the extent to which structural patterns are indicative of intended functionality and explore ways to improve the accuracy of these patterns.

Sources

arXiv - cs.LG

Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL

AI Commentary

Executive Summary

Key Points

Merits

Strength in scalability

Improved generalizability

Demerits

Potential for errors

Limited understanding of structural patterns

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.