OpenAI's o1 series commands a 93.5% implied probability of having the top AI model for math by March 31, driven by its unchallenged dominance on benchmarks like MATH (94.8%) and AIME 2025 (83.3%), where chain-of-thought reasoning delivers superior performance over rivals. Trader consensus reflects no credible near-term challengers, as Anthropic's Claude 3.5 Sonnet trails at 90%+ on GSM8K but lags on harder tests, while Google's Gemini 2.0 Flash and xAI's Grok-2 show competitive but subpar math scores. Scenarios to upend this include surprise releases like DeepSeek-V4 or an early Grok-3 drop, though timelines remain speculative amid uncertain product roadmaps.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jourOpenAI 94%
DeepSeek 1.2%
Anthropic 1.0%
xAI <1%
$282,426 Vol.
$282,426 Vol.

OpenAI
94%

DeepSeek
1%

Anthropic
1%

xAI
1%

1%

Moonshot
1%

Z.ai
<1%

Alibaba
<1%

Mistral
<1%
OpenAI 94%
DeepSeek 1.2%
Anthropic 1.0%
xAI <1%
$282,426 Vol.
$282,426 Vol.

OpenAI
94%

DeepSeek
1%

Anthropic
1%

xAI
1%

1%

Moonshot
1%

Z.ai
<1%

Alibaba
<1%

Mistral
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Marché ouvert : Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...Resolver
0x2F5e3684c...OpenAI's o1 series commands a 93.5% implied probability of having the top AI model for math by March 31, driven by its unchallenged dominance on benchmarks like MATH (94.8%) and AIME 2025 (83.3%), where chain-of-thought reasoning delivers superior performance over rivals. Trader consensus reflects no credible near-term challengers, as Anthropic's Claude 3.5 Sonnet trails at 90%+ on GSM8K but lags on harder tests, while Google's Gemini 2.0 Flash and xAI's Grok-2 show competitive but subpar math scores. Scenarios to upend this include surprise releases like DeepSeek-V4 or an early Grok-3 drop, though timelines remain speculative amid uncertain product roadmaps.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jour
Méfiez-vous des liens externes.
Méfiez-vous des liens externes.
Questions fréquentes