1/3

DeepMind Resolves 9 Open Erdős Problems — All Lean-Kernel Verified

DeepMind's AlphaProof Nexus agent resolved 9 open Erdős problems and 44 OEIS conjectures — all formally verified by the Lean 4 kernel with zero sorry placeholders. First large-scale open-problem resolution via AI + formal proof search. Claim audit: Lean-kernel verification confirmed; scope limited to Mathlib-mature domains.

2026/6/11 · 3:57

图集

What happened: On May 21, 2026, DeepMind and UT Austin released a preprint (arXiv:2605.22763) reporting the first large-scale resolution of open mathematical problems using AI-driven formal proof search. Their most capable agent autonomously resolved 9 out of 353 open Erdős problems and proved 44 out of 492 auto-formalized OEIS conjectures — all verified by the Lean 4 proof kernel, zero sorry placeholders.
This is a different category of result from the OpenAI unit-distance disproof covered last issue: every proof here is machine-checked. The Lean kernel accepted each one end-to-end; human experts only confirmed post-hoc that the formal statement faithfully matched the original informal problem.

How the Agent Works

The system is built on the AlphaProof Nexus framework. The core loop is simple: an LLM generates a Lean proof attempt, the Lean compiler verifies it and returns error feedback, the LLM revises — repeat until a sorry-free proof is accepted or the budget runs out.
The paper evaluates four agent designs:
  • Agent A (basic): multiple independent prover subagents, each running the LLM–Lean feedback loop.
  • Agent B: same as A, but can call AlphaProof (tree-search mode) to discharge hard subgoals.
  • Agent C: adds a shared population database of proof sketches; LLM raters score sketches by plausibility/clarity/novelty, P-UCB sampling drives evolutionary search.
  • Agent D (full): combines AlphaProof tool access + evolutionary search + global goal caching to avoid redundant computation.
Agent D solved the hardest problems (Erdős #125, #138) with 2–5× lower cost than Agent A. Standalone AlphaProof in tree-search mode solved 0/9 — the LLM-guided natural language layer is the critical piece, not just search.
Domains where the agent succeeded: combinatorics, optimization, graph theory, algebraic geometry, additive number theory, quantum optics. All domains where Mathlib is mature and problems decompose into tractable subgoals.

What Was Proved

Nine Erdős open problems including:
  • #12(i), #12(ii) — chromatic / combinatorial
  • #125, #138, #152 — among the hardest cases for basic agents; required Agent D
  • #741(i), #741(ii), #846
  • #26 variant — related to Ben Green's open conjecture list (#57)
Plus 44/492 OEIS conjectures formally proved, plus results in optimization (tight O(1/t) rate for Anchored Gradient Descent-Ascent), a Graffiti conjecture from 1996 (graph theory), Zanello's log-concavity conjecture for pure O-sequences (algebraic geometry), and quantum optics existence conjectures for N=d ∈ {4, 6, 10}.
Cost: a few hundred USD per solved problem (Agent D); high variance. Screening all 353 Erdős problems for tractability required significant total compute.

Claim Audit

ClaimVerdictNotes
Lean-kernel verification✅ VerifiedAll 9 Erdős proofs are sorry-free; Lean kernel accepted each one end-to-end
Autonomy✅ VerifiedAgent found proofs without step-by-step human guidance; pre-formalized statements provided as input
Statement faithfulness✅ VerifiedDomain experts confirmed each formal statement matches the original open problem post-hoc
Scope⚠️ Partial9/353 Erdős problems; selection biased toward Mathlib-mature domains; deep structural results remain out of reach
Cost claim⚠️ Partial"Few hundred USD" is median; true cost has high variance; total screening budget was substantially larger
Independent replication❓ PendingAs of June 10, 2026, no independent third-party replication reported
Key distinction from last issue: The unit-distance disproof (OpenAI, May 20) was verified by human experts (Gowers, Alon, Shankar, Tsimerman) — not the Lean kernel. LeanMarathon tried to formalize it and failed. This paper's 9 Erdős proofs are the inverse: weaker human expert involvement, stronger mechanical verification.

Primary Sources

评论