OpenAI holds a commanding 93.5% implied probability for the best AI model on math benchmarks by March 31, driven by its o1 reasoning series' unmatched performance—83.3% on the MATH dataset and 74.3% on AIME problems—released in September 2024 and still leading public leaderboards like LMSYS Arena's math category. Competitors such as xAI's Grok-2 (around 60% MATH), DeepSeek-V3, and Google's Gemini trail significantly, with no verified surpassing demos. Trader sentiment reflects low disruption risk short-term, but realistic challenges include xAI's Grok-3 release (trained on massive compute, teased for Q1 2025) or breakthroughs from Anthropic's Claude or Moonshot AI eclipsing o1's chain-of-thought math reasoning before deadline.
Resumen experimental generado por IA con datos de Polymarket · ActualizadoOpenAI 94%
xAI 1.7%
DeepSeek 1.6%
Moonshot 1.2%
$145,484 Vol.
$145,484 Vol.

OpenAI
94%

xAI
2%

DeepSeek
2%

Moonshot
1%

1%

Anthropic
1%

Z.ai
<1%

Alibaba
<1%

Mistral
<1%
OpenAI 94%
xAI 1.7%
DeepSeek 1.6%
Moonshot 1.2%
$145,484 Vol.
$145,484 Vol.

OpenAI
94%

xAI
2%

DeepSeek
2%

Moonshot
1%

1%

Anthropic
1%

Z.ai
<1%

Alibaba
<1%

Mistral
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Mercado abierto: Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...Resolver
0x2F5e3684c...OpenAI holds a commanding 93.5% implied probability for the best AI model on math benchmarks by March 31, driven by its o1 reasoning series' unmatched performance—83.3% on the MATH dataset and 74.3% on AIME problems—released in September 2024 and still leading public leaderboards like LMSYS Arena's math category. Competitors such as xAI's Grok-2 (around 60% MATH), DeepSeek-V3, and Google's Gemini trail significantly, with no verified surpassing demos. Trader sentiment reflects low disruption risk short-term, but realistic challenges include xAI's Grok-3 release (trained on massive compute, teased for Q1 2025) or breakthroughs from Anthropic's Claude or Moonshot AI eclipsing o1's chain-of-thought math reasoning before deadline.
Resumen experimental generado por IA con datos de Polymarket · Actualizado
Cuidado con los enlaces externos.
Cuidado con los enlaces externos.
Preguntas frecuentes