OpenAI traders are betting against a major breakthrough on the FrontierMath benchmark by June 30, 2025, with implied probabilities hovering below 20% for scores exceeding 10%, driven by the benchmark's extreme difficulty—current top models like o1-preview score just 2.4% on its hardest problems, far below human experts on subsets. Recent o1 releases showcased math gains on AIME (83%) but faltered here, underscoring reasoning limits amid competitive pressure from Anthropic's Claude 3.5 Sonnet (1.5%) and Google's Gemini. Key catalysts include OpenAI's anticipated GPT-5 rollout in mid-2025 and potential Strawberry reasoning upgrades, though delays and benchmark hardness temper optimism ahead of any spring announcements.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jour60 %+
55%
70 %+
16%
$0.00 Vol.
60 %+
55%
70 %+
16%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Marché ouvert : Jan 29, 2026, 12:47 PM ET
Resolver
0x65070BE91...Resolver
0x65070BE91...OpenAI traders are betting against a major breakthrough on the FrontierMath benchmark by June 30, 2025, with implied probabilities hovering below 20% for scores exceeding 10%, driven by the benchmark's extreme difficulty—current top models like o1-preview score just 2.4% on its hardest problems, far below human experts on subsets. Recent o1 releases showcased math gains on AIME (83%) but faltered here, underscoring reasoning limits amid competitive pressure from Anthropic's Claude 3.5 Sonnet (1.5%) and Google's Gemini. Key catalysts include OpenAI's anticipated GPT-5 rollout in mid-2025 and potential Strawberry reasoning upgrades, though delays and benchmark hardness temper optimism ahead of any spring announcements.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jour
Méfiez-vous des liens externes.
Méfiez-vous des liens externes.
Questions fréquentes