OpenAI commands a 94.5% implied probability on Polymarket for fielding the top AI model on math benchmarks by March 31, driven by its o1-preview and o1-mini models' unmatched performance on key tests like the MATH dataset (94.8% accuracy) and AIME 2024 (83%), far surpassing rivals such as DeepSeek's math-focused variants or Anthropic's Claude 3.5 Sonnet. Trader consensus reflects o1's chain-of-thought reasoning edge, with no verified competitor releases closing the gap since September 2024 launches. Challenges could arise from xAI's anticipated Grok-3 rollout or Google's Gemini updates eclipsing o1 scores before resolution, though historical benchmark slippage tempers such upset odds.
Experimental AI-generated summary referencing Polymarket data · UpdatedOpenAI 95%
DeepSeek 1.7%
xAI 1.6%
Anthropic 1.2%
$315,977 Vol.
$315,977 Vol.

OpenAI
95%

DeepSeek
2%

xAI
2%

Anthropic
1%

1%

Moonshot
1%

Z.ai
<1%

Alibaba
<1%

Mistral
<1%
OpenAI 95%
DeepSeek 1.7%
xAI 1.6%
Anthropic 1.2%
$315,977 Vol.
$315,977 Vol.

OpenAI
95%

DeepSeek
2%

xAI
2%

Anthropic
1%

1%

Moonshot
1%

Z.ai
<1%

Alibaba
<1%

Mistral
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Market Opened: Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...Resolver
0x2F5e3684c...OpenAI commands a 94.5% implied probability on Polymarket for fielding the top AI model on math benchmarks by March 31, driven by its o1-preview and o1-mini models' unmatched performance on key tests like the MATH dataset (94.8% accuracy) and AIME 2024 (83%), far surpassing rivals such as DeepSeek's math-focused variants or Anthropic's Claude 3.5 Sonnet. Trader consensus reflects o1's chain-of-thought reasoning edge, with no verified competitor releases closing the gap since September 2024 launches. Challenges could arise from xAI's anticipated Grok-3 rollout or Google's Gemini updates eclipsing o1 scores before resolution, though historical benchmark slippage tempers such upset odds.
Experimental AI-generated summary referencing Polymarket data · Updated


Beware of external links.
Beware of external links.
Frequently Asked Questions