DeepMind Resolves 9 Open Erdős Problems — All Lean-Kernel Verified

What happened: On May 21, 2026, DeepMind and UT Austin released a preprint (arXiv:2605.22763) reporting the first large-scale resolution of open mathematical problems using AI-driven formal proof search. Their most capable agent autonomously resolved 9 out of 353 open Erdős problems and proved 44 out of 492 auto-formalized OEIS conjectures — all verified by the Lean 4 proof kernel, zero sorry placeholders.

This is a different category of result from the OpenAI unit-distance disproof covered last issue: every proof here is machine-checked. The Lean kernel accepted each one end-to-end; human experts only confirmed post-hoc that the formal statement faithfully matched the original informal problem.

How the Agent Works

The system is built on the AlphaProof Nexus framework. The core loop is simple: an LLM generates a Lean proof attempt, the Lean compiler verifies it and returns error feedback, the LLM revises — repeat until a sorry-free proof is accepted or the budget runs out.

The paper evaluates four agent designs:

Agent A (basic): multiple independent prover subagents, each running the LLM–Lean feedback loop.
Agent B: same as A, but can call AlphaProof (tree-search mode) to discharge hard subgoals.
Agent C: adds a shared population database of proof sketches; LLM raters score sketches by plausibility/clarity/novelty, P-UCB sampling drives evolutionary search.
Agent D (full): combines AlphaProof tool access + evolutionary search + global goal caching to avoid redundant computation.

Agent D solved the hardest problems (Erdős #125, #138) with 2–5× lower cost than Agent A. Standalone AlphaProof in tree-search mode solved 0/9 — the LLM-guided natural language layer is the critical piece, not just search.

Domains where the agent succeeded: combinatorics, optimization, graph theory, algebraic geometry, additive number theory, quantum optics. All domains where Mathlib is mature and problems decompose into tractable subgoals.

What Was Proved

Nine Erdős open problems including:

#12(i), #12(ii) — chromatic / combinatorial
#125, #138, #152 — among the hardest cases for basic agents; required Agent D
#741(i), #741(ii), #846
#26 variant — related to Ben Green's open conjecture list (#57)

Plus 44/492 OEIS conjectures formally proved, plus results in optimization (tight O(1/t) rate for Anchored Gradient Descent-Ascent), a Graffiti conjecture from 1996 (graph theory), Zanello's log-concavity conjecture for pure O-sequences (algebraic geometry), and quantum optics existence conjectures for N=d ∈ {4, 6, 10}.

Cost: a few hundred USD per solved problem (Agent D); high variance. Screening all 353 Erdős problems for tractability required significant total compute.

Claim Audit

Claim	Verdict	Notes
Lean-kernel verification	✅ Verified	All 9 Erdős proofs are `sorry`-free; Lean kernel accepted each one end-to-end
Autonomy	✅ Verified	Agent found proofs without step-by-step human guidance; pre-formalized statements provided as input
Statement faithfulness	✅ Verified	Domain experts confirmed each formal statement matches the original open problem post-hoc
Scope	⚠️ Partial	9/353 Erdős problems; selection biased toward Mathlib-mature domains; deep structural results remain out of reach
Cost claim	⚠️ Partial	"Few hundred USD" is median; true cost has high variance; total screening budget was substantially larger
Independent replication	❓ Pending	As of June 10, 2026, no independent third-party replication reported

Key distinction from last issue: The unit-distance disproof (OpenAI, May 20) was verified by human experts (Gowers, Alon, Shankar, Tsimerman) — not the Lean kernel. LeanMarathon tried to formalize it and failed. This paper's 9 Erdős proofs are the inverse: weaker human expert involvement, stronger mechanical verification.

Primary Sources

Preprint: arXiv:2605.22763v2 — Tsoukalas et al., DeepMind / UT Austin, submitted May 21, revised June 8, 2026
Related: SorryDB paper arXiv:2603.02668 — real-world Lean sorry benchmark (see next issue)

DeepMind Resolves 9 Open Erdős Problems — All Lean-Kernel Verified

How the Agent Works

What Was Proved

Claim Audit

Primary Sources

评论