Academic

Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

arXiv:2603.20405v1 Announce Type: new Abstract: We report on an experiment in which Claude Opus~4.6, equipped with a suite of Model Context Protocol (MCP) tools for the Rocq proof assistant, autonomously proved 10 of 12 problems from the 2025 Putnam Mathematical Competition. The MCP tools, designed with Claude by analyzing logs from a prior experiment on miniF2F-Rocq, encode a "compile-first, interactive-fallback" strategy. Running on an isolated VM with no internet access, the agent deployed 141 subagents over 17.7 hours of active compute (51.6h wall-clock), consuming approximately 1.9 billion tokens. All proofs are publicly available.

arXiv:2603.20405v1 Announce Type: new Abstract: We report on an experiment in which Claude Opus~4.6, equipped with a suite of Model Context Protocol (MCP) tools for the Rocq proof assistant, autonomously proved 10 of 12 problems from the 2025 Putnam Mathematical Competition. The MCP tools, designed with Claude by analyzing logs from a prior experiment on miniF2F-Rocq, encode a "compile-first, interactive-fallback" strategy. Running on an isolated VM with no internet access, the agent deployed 141 subagents over 17.7 hours of active compute (51.6h wall-clock), consuming approximately 1.9 billion tokens. All proofs are publicly available.

Executive Summary

This article reports on an experiment where Claude Opus 4.6, equipped with Model Context Protocol (MCP) tools for the Rocq proof assistant, autonomously solved 10 out of 12 problems from the 2025 Putnam Mathematical Competition. The MCP tools implemented a 'compile-first, interactive-fallback' strategy and deployed 141 subagents over 17.7 hours of active compute. The results demonstrate the potential of AI-assisted proof verification and expand the capabilities of the Rocq proof assistant. However, the experiment's isolation and limited problem set raise questions about the generalizability of these findings.

Key Points

  • Claude Opus 4.6 with MCP tools for Rocq proof assistant autonomously solved 10 out of 12 Putnam 2025 problems.
  • MCP tools implemented a 'compile-first, interactive-fallback' strategy.
  • 141 subagents deployed over 17.7 hours of active compute.

Merits

Advancements in AI-Assisted Proof Verification

The experiment demonstrates the potential of AI-assisted proof verification, expanding the capabilities of the Rocq proof assistant and paving the way for further research in this area.

Improved Efficiency and Scalability

The MCP tools and subagents deployed in this experiment significantly improved the efficiency and scalability of proof verification, highlighting the potential for future applications in automated theorem proving.

Demerits

Limited Generalizability

The experiment's isolation and limited problem set raise concerns about the generalizability of these findings, highlighting the need for further research to validate the results across a broader range of problems and environments.

Dependence on MCP Tools

The experiment's reliance on MCP tools, which were designed with Claude by analyzing logs from a prior experiment, raises questions about the transferability of these results to other proof assistants and problem sets.

Expert Commentary

The results of this experiment are significant and demonstrate the potential of AI-assisted proof verification. However, the limitations of the experiment and the reliance on MCP tools raise important questions about the generalizability of these findings. Further research is needed to validate the results across a broader range of problems and environments. The development of AI-assisted proof verification tools has significant practical implications for mathematicians and researchers, and raises important policy questions about the role of AI in mathematical research.

Recommendations

  • Future experiments should prioritize the development of transferable AI-assisted proof verification tools that can be applied across a range of proof assistants and problem sets.
  • Researchers should explore the potential applications of AI-assisted proof verification in other fields, such as computer science and philosophy, to fully realize the benefits of this technology.

Sources

Original: arXiv - cs.LG