Anthropic's Claude Opus 4.6 delivered 40.7% accuracy on FrontierMath Tiers 1–3—the benchmark's core undergraduate-to-postdoc math problems—closely trailing OpenAI's GPT-5.4 Pro record of 50%, per Epoch AI evaluations in March 2026, while quadrupling prior Tier 4 research-level scores to roughly 21%. This rapid progress underscores Anthropic's scaling in AI mathematical reasoning amid fierce competition from OpenAI and Google DeepMind. Leaked documents from late March revealing "Claude Mythos," a super-capable next-gen model posing novel cybersecurity risks, have spurred trader consensus toward a potential breakthrough. With Claude 5 eyed for Q2 release, the three months to June 30 offer ample runway for model advancements to shift implied probabilities, though benchmark surprises and delays loom.
Resumo experimental gerado por IA com dados do Polymarket · AtualizadoPontuação antrópica de Claude no FrontierMath Benchmark até 30 de junho?
Pontuação antrópica de Claude no FrontierMath Benchmark até 30 de junho?
$56,176 Vol.
50%+
75%
$56,176 Vol.
50%+
75%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Mercado Aberto: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Resolver
0x65070BE91...Anthropic's Claude Opus 4.6 delivered 40.7% accuracy on FrontierMath Tiers 1–3—the benchmark's core undergraduate-to-postdoc math problems—closely trailing OpenAI's GPT-5.4 Pro record of 50%, per Epoch AI evaluations in March 2026, while quadrupling prior Tier 4 research-level scores to roughly 21%. This rapid progress underscores Anthropic's scaling in AI mathematical reasoning amid fierce competition from OpenAI and Google DeepMind. Leaked documents from late March revealing "Claude Mythos," a super-capable next-gen model posing novel cybersecurity risks, have spurred trader consensus toward a potential breakthrough. With Claude 5 eyed for Q2 release, the three months to June 30 offer ample runway for model advancements to shift implied probabilities, though benchmark surprises and delays loom.
Resumo experimental gerado por IA com dados do Polymarket · Atualizado
Cuidado com os links externos.
Cuidado com os links externos.
Frequently Asked Questions