OpenAI's o1 reasoning model dominates trader sentiment with a 93% implied probability of claiming the best AI for math by March 31, driven by its verified supremacy on benchmarks like GSM8K (96.8% accuracy) and MATH (94.1%), far outpacing rivals such as Anthropic's Claude 3.5 Sonnet (92.1% on GSM8K) and DeepSeek's V3. This lead stems from o1's chain-of-thought techniques enabling superior step-by-step problem-solving, with no competitor surpassing it in official evals as of late 2024. Challenges could arise from xAI's anticipated Grok-3 release, Google's Gemini 2.0 updates, or rapid iterations from DeepSeek, but traders see slim odds of overtaking absent major pre-deadline breakthroughs.
Experimental AI-generated summary referencing Polymarket data · UpdatedOpenAI 93%
DeepSeek 2.4%
Anthropic 2.3%
xAI 2.3%
$324,463 Vol.
$324,463 Vol.

OpenAI
93%

DeepSeek
2%

Anthropic
2%

xAI
2%

1%

Z.ai
<1%

Alibaba
<1%

Moonshot
<1%

Mistral
<1%
OpenAI 93%
DeepSeek 2.4%
Anthropic 2.3%
xAI 2.3%
$324,463 Vol.
$324,463 Vol.

OpenAI
93%

DeepSeek
2%

Anthropic
2%

xAI
2%

1%

Z.ai
<1%

Alibaba
<1%

Moonshot
<1%

Mistral
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Market Opened: Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...Resolver
0x2F5e3684c...OpenAI's o1 reasoning model dominates trader sentiment with a 93% implied probability of claiming the best AI for math by March 31, driven by its verified supremacy on benchmarks like GSM8K (96.8% accuracy) and MATH (94.1%), far outpacing rivals such as Anthropic's Claude 3.5 Sonnet (92.1% on GSM8K) and DeepSeek's V3. This lead stems from o1's chain-of-thought techniques enabling superior step-by-step problem-solving, with no competitor surpassing it in official evals as of late 2024. Challenges could arise from xAI's anticipated Grok-3 release, Google's Gemini 2.0 updates, or rapid iterations from DeepSeek, but traders see slim odds of overtaking absent major pre-deadline breakthroughs.
Experimental AI-generated summary referencing Polymarket data · Updated


Beware of external links.
Beware of external links.
Frequently Asked Questions