Academic

Learning to Generate Secure Code via Token-Level Rewards

arXiv:2602.23407v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained reinforcement learning reward signals. To address these challenges, we propose Vul2Safe, a new secure code generation framework that leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, and further generates diverse implicit prompts to build the PrimeVul+ dataset. Meanwhile, we introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security, which enables the model to continuously attend to and reinforce critical fine-grained security patterns during training. Compared with traditional instance-level reward schemes, our approach allows for more prec

Jiazheng Quan, Xiaodong Li, Bin Wang, Guo An, Like Liu, Degen Huang, Lin Liu, Chengbin Hou · March 7, 2026 · 1 min read · 40 views

#cs.CR #cs.AI #cs.SE

Executive Summary

This article proposes a novel secure code generation framework, Vul2Safe, and a novel training framework, SRCode, to address the limitations of existing approaches in generating secure code using large language models (LLMs). Vul2Safe leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, while SRCode utilizes token-level rewards in reinforcement learning to enable the model to continuously attend to and reinforce critical fine-grained security patterns during training. The proposed approach is shown to substantially reduce security vulnerabilities in generated code and improve overall code quality across multiple benchmarks. This work has significant implications for the development of secure software and highlights the potential of LLMs in addressing complex security challenges.

Key Points

▸ The proposed Vul2Safe framework leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities.
▸ The SRCode framework utilizes token-level rewards in reinforcement learning to enable the model to attend to fine-grained security patterns.
▸ The approach is shown to substantially reduce security vulnerabilities in generated code and improve overall code quality.

Merits

Strength in Security

The proposed approach provides a more precise optimization of local security implementations, reducing security vulnerabilities in generated code.

Improved Code Quality

The framework is shown to improve overall code quality across multiple benchmarks, indicating a significant advancement in code generation capabilities.

Demerits

Data Scarcity

The proposed approach relies on the availability of high-quality security data, which may be scarce in certain domains or scenarios.

Dependence on LLMs

The effectiveness of the proposed approach is contingent on the capabilities of LLMs, which may have limitations in understanding complex security concepts.

Expert Commentary

The article presents a novel and promising approach to secure code generation using LLMs and reinforcement learning. The proposed framework, Vul2Safe, and training framework, SRCode, demonstrate a significant advancement in code generation capabilities and security. However, the approach relies on the availability of high-quality security data and the capabilities of LLMs, which may have limitations. Furthermore, the practical implications of this work are substantial, as it enables developers to generate more secure code with improved quality. Nevertheless, the policy implications of this work are less clear and warrant further exploration.

Recommendations

✓ Developers and researchers should explore the potential of LLMs and reinforcement learning in addressing complex security challenges.
✓ Further research is needed to investigate the scalability and applicability of the proposed approach in various domains and scenarios.

Sources

arXiv - cs.AI

Learning to Generate Secure Code via Token-Level Rewards

AI Commentary

Executive Summary

Key Points

Merits

Strength in Security

Improved Code Quality

Demerits

Data Scarcity

Dependence on LLMs

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs