EVMbench: Evaluating AI Agents on Smart Contract Security
arXiv:2603.04915v1 Announce Type: new Abstract: Smart contracts on public blockchains now manage large amounts of value, and vulnerabilities in these systems can lead to substantial losses. As AI agents become more capable at reading, writing, and running code, it is natural to ask how well they can already navigate this landscape, both in ways that improve security and in ways that might increase risk. We introduce EVMbench, an evaluation that measures the ability of agents to detect, patch, and exploit smart contract vulnerabilities. EVMbench draws on 117 curated vulnerabilities from 40 repositories and, in the most realistic setting, uses programmatic grading based on tests and blockchain state under a local Ethereum execution environment. We evaluate a range of frontier agents and find that they are capable of discovering and exploiting vulnerabilities end-to-end against live blockchain instances. We release code, tasks, and tooling to support continued measurement of these capabi
arXiv:2603.04915v1 Announce Type: new Abstract: Smart contracts on public blockchains now manage large amounts of value, and vulnerabilities in these systems can lead to substantial losses. As AI agents become more capable at reading, writing, and running code, it is natural to ask how well they can already navigate this landscape, both in ways that improve security and in ways that might increase risk. We introduce EVMbench, an evaluation that measures the ability of agents to detect, patch, and exploit smart contract vulnerabilities. EVMbench draws on 117 curated vulnerabilities from 40 repositories and, in the most realistic setting, uses programmatic grading based on tests and blockchain state under a local Ethereum execution environment. We evaluate a range of frontier agents and find that they are capable of discovering and exploiting vulnerabilities end-to-end against live blockchain instances. We release code, tasks, and tooling to support continued measurement of these capabilities and future work on security.
Executive Summary
The article introduces EVMbench, a novel evaluation framework for assessing the capabilities of AI agents in detecting, patching, and exploiting smart contract vulnerabilities on public blockchains. The framework utilizes 117 curated vulnerabilities from 40 repositories and evaluates frontier AI agents in a realistic setting. The results show that AI agents can discover and exploit vulnerabilities end-to-end against live blockchain instances, highlighting both the potential benefits and risks of AI in smart contract security. The release of EVMbench's code, tasks, and tooling supports continued measurement and future research on this critical topic.
Key Points
- ▸ Introduction of EVMbench, a framework for evaluating AI agents on smart contract security
- ▸ Evaluation of AI agents' capabilities in detecting, patching, and exploiting smart contract vulnerabilities
- ▸ Use of 117 curated vulnerabilities from 40 repositories in a realistic setting
Merits
Comprehensive Evaluation Framework
EVMbench provides a thorough and realistic assessment of AI agents' capabilities in smart contract security, utilizing a large set of curated vulnerabilities.
Demerits
Limited Generalizability
The evaluation is limited to a specific set of AI agents and vulnerabilities, which may not be representative of the broader smart contract security landscape.
Expert Commentary
The introduction of EVMbench marks a significant step forward in the evaluation of AI agents' capabilities in smart contract security. The framework's comprehensive and realistic approach provides valuable insights into the potential benefits and risks of AI in this domain. As the use of AI in smart contract security continues to grow, it is essential to prioritize the development of effective evaluation frameworks and regulatory standards to ensure the responsible development and deployment of AI-powered security tools and services. The article's findings have important implications for both practitioners and policymakers, highlighting the need for careful consideration of AI's potential impact on smart contract security.
Recommendations
- ✓ Further research on the development of more effective evaluation frameworks for AI agents in smart contract security
- ✓ The establishment of regulatory frameworks and standards for the responsible development and deployment of AI in smart contract security