OpenAI commands a 99.2% implied probability in trader consensus for the best AI model on math benchmarks by March 31, propelled by its o1 series' record-shattering performance since September 2024, including 94.1% accuracy on the rigorous MATH dataset and 83.3% on AIME 2024 competition problems—metrics that eclipse rivals like Anthropic's Claude 3.5 Sonnet (around 80% on MATH) and Google's Gemini models. This dominance stems from o1's advanced chain-of-thought reasoning, enabling superior handling of multi-step math proofs and Olympiad-level challenges, with no credible announcements of surpassing capabilities from competitors in the past 30 days. Scenarios challenging this include an unexpected model release from DeepMind or xAI with validated benchmark gains, or shifts in evaluation criteria, though AI development timelines typically preclude such shifts by quarter-end.
Resumen experimental generado por IA con datos de Polymarket · ActualizadoOpenAI 99.3%
xAI <1%
Google <1%
DeepSeek <1%
$472,975 Vol.
$472,975 Vol.

OpenAI
99%

xAI
<1%

<1%

DeepSeek
<1%

Anthropic
<1%

Z.ai
<1%

Mistral
<1%

Alibaba
<1%

Moonshot
<1%
OpenAI 99.3%
xAI <1%
Google <1%
DeepSeek <1%
$472,975 Vol.
$472,975 Vol.

OpenAI
99%

xAI
<1%

<1%

DeepSeek
<1%

Anthropic
<1%

Z.ai
<1%

Mistral
<1%

Alibaba
<1%

Moonshot
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Mercado abierto: Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Resolver
0x2F5e3684c...OpenAI commands a 99.2% implied probability in trader consensus for the best AI model on math benchmarks by March 31, propelled by its o1 series' record-shattering performance since September 2024, including 94.1% accuracy on the rigorous MATH dataset and 83.3% on AIME 2024 competition problems—metrics that eclipse rivals like Anthropic's Claude 3.5 Sonnet (around 80% on MATH) and Google's Gemini models. This dominance stems from o1's advanced chain-of-thought reasoning, enabling superior handling of multi-step math proofs and Olympiad-level challenges, with no credible announcements of surpassing capabilities from competitors in the past 30 days. Scenarios challenging this include an unexpected model release from DeepMind or xAI with validated benchmark gains, or shifts in evaluation criteria, though AI development timelines typically preclude such shifts by quarter-end.
Resumen experimental generado por IA con datos de Polymarket · Actualizado
Cuidado con los enlaces externos.
Cuidado con los enlaces externos.
Preguntas frecuentes