Trader sentiment on Polymarket tilts bearish for Anthropic's Claude reaching a competitive score—likely >10%—on the FrontierMath benchmark by June 30, driven by its current dismal 1.6% performance with Claude 3.5 Sonnet on the 199-problem test of novel, PhD-level math challenges from METR. This lags OpenAI's o1-preview at around 5% and Gemini 2.0 experimental at 9.5%, highlighting Claude's reasoning gaps despite strong coding showings. Recent announcements confirm no Claude 4 release until mid-2025 at earliest, with Anthropic prioritizing safety over rapid math scaling amid competitive pressure from OpenAI's reasoning chain advances. Traders eye January developer updates and Q1 earnings for timeline clues, as benchmark evolves with new problems.
Experimental AI-generated summary referencing Polymarket data · Updated$53,638 Vol.
50%+
52%
$53,638 Vol.
50%+
52%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Market Opened: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...Outcome proposed: Yes
No dispute
Final outcome: Yes
Resolver
0x65070BE91...Trader sentiment on Polymarket tilts bearish for Anthropic's Claude reaching a competitive score—likely >10%—on the FrontierMath benchmark by June 30, driven by its current dismal 1.6% performance with Claude 3.5 Sonnet on the 199-problem test of novel, PhD-level math challenges from METR. This lags OpenAI's o1-preview at around 5% and Gemini 2.0 experimental at 9.5%, highlighting Claude's reasoning gaps despite strong coding showings. Recent announcements confirm no Claude 4 release until mid-2025 at earliest, with Anthropic prioritizing safety over rapid math scaling amid competitive pressure from OpenAI's reasoning chain advances. Traders eye January developer updates and Q1 earnings for timeline clues, as benchmark evolves with new problems.
Experimental AI-generated summary referencing Polymarket data · Updated


Beware of external links.
Beware of external links.
Frequently Asked Questions