Academic

EVMbench: Evaluating AI Agents on Smart Contract Security

arXiv:2603.04915v1 Announce Type: new Abstract: Smart contracts on public blockchains now manage large amounts of value, and vulnerabilities in these systems can lead to substantial losses. As AI agents become more capable at reading, writing, and running code, it is natural to ask how well they can already navigate this landscape, both in ways that improve security and in ways that might increase risk. We introduce EVMbench, an evaluation that measures the ability of agents to detect, patch, and exploit smart contract vulnerabilities. EVMbench draws on 117 curated vulnerabilities from 40 repositories and, in the most realistic setting, uses programmatic grading based on tests and blockchain state under a local Ethereum execution environment. We evaluate a range of frontier agents and find that they are capable of discovering and exploiting vulnerabilities end-to-end against live blockchain instances. We release code, tasks, and tooling to support continued measurement of these capabi

Justin Wang, Andreas Bigger, Xiaohai Xu, Justin W. Lin, Andy Applebaum, Tejal Patwardhan, Alpin Yukseloglu, Olivia Watkins · March 7, 2026 · 1 min read · 18 views

#cs.LG #cs.AI #cs.CR

Executive Summary

The article introduces EVMbench, a novel evaluation framework for assessing the capabilities of AI agents in detecting, patching, and exploiting smart contract vulnerabilities on public blockchains. The framework utilizes 117 curated vulnerabilities from 40 repositories and evaluates frontier AI agents in a realistic setting. The results show that AI agents can discover and exploit vulnerabilities end-to-end against live blockchain instances, highlighting both the potential benefits and risks of AI in smart contract security. The release of EVMbench's code, tasks, and tooling supports continued measurement and future research on this critical topic.

Key Points

▸ Introduction of EVMbench, a framework for evaluating AI agents on smart contract security
▸ Evaluation of AI agents' capabilities in detecting, patching, and exploiting smart contract vulnerabilities
▸ Use of 117 curated vulnerabilities from 40 repositories in a realistic setting

Merits

Comprehensive Evaluation Framework

EVMbench provides a thorough and realistic assessment of AI agents' capabilities in smart contract security, utilizing a large set of curated vulnerabilities.

Demerits

Limited Generalizability

The evaluation is limited to a specific set of AI agents and vulnerabilities, which may not be representative of the broader smart contract security landscape.

Expert Commentary

The introduction of EVMbench marks a significant step forward in the evaluation of AI agents' capabilities in smart contract security. The framework's comprehensive and realistic approach provides valuable insights into the potential benefits and risks of AI in this domain. As the use of AI in smart contract security continues to grow, it is essential to prioritize the development of effective evaluation frameworks and regulatory standards to ensure the responsible development and deployment of AI-powered security tools and services. The article's findings have important implications for both practitioners and policymakers, highlighting the need for careful consideration of AI's potential impact on smart contract security.

Recommendations

✓ Further research on the development of more effective evaluation frameworks for AI agents in smart contract security
✓ The establishment of regulatory frameworks and standards for the responsible development and deployment of AI in smart contract security

Sources

arXiv - cs.LG

EVMbench: Evaluating AI Agents on Smart Contract Security

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive Evaluation Framework

Demerits

Limited Generalizability

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs