Learning to Generate Secure Code via Token-Level Rewards
arXiv:2602.23407v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained reinforcement learning reward signals. To address these challenges, we propose Vul2Safe, a new secure code generation framework that leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, and further generates diverse implicit prompts to build the PrimeVul+ dataset. Meanwhile, we introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security, which enables the model to continuously attend to and reinforce critical fine-grained security patterns during training. Compared with traditional instance-level reward schemes, our approach allows for more prec
arXiv:2602.23407v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained reinforcement learning reward signals. To address these challenges, we propose Vul2Safe, a new secure code generation framework that leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, and further generates diverse implicit prompts to build the PrimeVul+ dataset. Meanwhile, we introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security, which enables the model to continuously attend to and reinforce critical fine-grained security patterns during training. Compared with traditional instance-level reward schemes, our approach allows for more precise optimization of local security implementations. Extensive experiments show that PrimeVul+ and SRCode substantially reduce security vulnerabilities in generated code while improving overall code quality across multiple benchmarks.
Executive Summary
This article proposes a novel secure code generation framework, Vul2Safe, and a novel training framework, SRCode, to address the limitations of existing approaches in generating secure code using large language models (LLMs). Vul2Safe leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, while SRCode utilizes token-level rewards in reinforcement learning to enable the model to continuously attend to and reinforce critical fine-grained security patterns during training. The proposed approach is shown to substantially reduce security vulnerabilities in generated code and improve overall code quality across multiple benchmarks. This work has significant implications for the development of secure software and highlights the potential of LLMs in addressing complex security challenges.
Key Points
- ▸ The proposed Vul2Safe framework leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities.
- ▸ The SRCode framework utilizes token-level rewards in reinforcement learning to enable the model to attend to fine-grained security patterns.
- ▸ The approach is shown to substantially reduce security vulnerabilities in generated code and improve overall code quality.
Merits
Strength in Security
The proposed approach provides a more precise optimization of local security implementations, reducing security vulnerabilities in generated code.
Improved Code Quality
The framework is shown to improve overall code quality across multiple benchmarks, indicating a significant advancement in code generation capabilities.
Demerits
Data Scarcity
The proposed approach relies on the availability of high-quality security data, which may be scarce in certain domains or scenarios.
Dependence on LLMs
The effectiveness of the proposed approach is contingent on the capabilities of LLMs, which may have limitations in understanding complex security concepts.
Expert Commentary
The article presents a novel and promising approach to secure code generation using LLMs and reinforcement learning. The proposed framework, Vul2Safe, and training framework, SRCode, demonstrate a significant advancement in code generation capabilities and security. However, the approach relies on the availability of high-quality security data and the capabilities of LLMs, which may have limitations. Furthermore, the practical implications of this work are substantial, as it enables developers to generate more secure code with improved quality. Nevertheless, the policy implications of this work are less clear and warrant further exploration.
Recommendations
- ✓ Developers and researchers should explore the potential of LLMs and reinforcement learning in addressing complex security challenges.
- ✓ Further research is needed to investigate the scalability and applicability of the proposed approach in various domains and scenarios.