OpenAI's o1 reasoning models dominate trader sentiment with a 94.5% implied probability of having the best AI model for math by March 31, driven by their record-breaking scores—such as 94.8% on the MATH benchmark and 83.3% on AIME 2024—that outpace rivals on standard evals from sources like Artificial Analysis. No competitor, including DeepSeek-V3 or Anthropic's Claude 3.5 Sonnet, has demonstrated superior math reasoning in verified tests, reinforcing consensus amid a quiet release pipeline. Challenges could arise from surprise launches, like xAI's Grok-3 (training complete, potential early rollout) or Google's Gemini 2.0 updates, if they top leaderboards before deadline, though historical slips temper expectations.
Experimental AI-generated summary referencing Polymarket data · UpdatedOpenAI 95%
DeepSeek 1.7%
xAI 1.6%
Anthropic 1.2%
$315,946 Vol.
$315,946 Vol.

OpenAI
95%

DeepSeek
2%

xAI
2%

Anthropic
1%

1%

Moonshot
1%

Z.ai
<1%

Alibaba
<1%

Mistral
<1%
OpenAI 95%
DeepSeek 1.7%
xAI 1.6%
Anthropic 1.2%
$315,946 Vol.
$315,946 Vol.

OpenAI
95%

DeepSeek
2%

xAI
2%

Anthropic
1%

1%

Moonshot
1%

Z.ai
<1%

Alibaba
<1%

Mistral
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Market Opened: Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...Resolver
0x2F5e3684c...OpenAI's o1 reasoning models dominate trader sentiment with a 94.5% implied probability of having the best AI model for math by March 31, driven by their record-breaking scores—such as 94.8% on the MATH benchmark and 83.3% on AIME 2024—that outpace rivals on standard evals from sources like Artificial Analysis. No competitor, including DeepSeek-V3 or Anthropic's Claude 3.5 Sonnet, has demonstrated superior math reasoning in verified tests, reinforcing consensus amid a quiet release pipeline. Challenges could arise from surprise launches, like xAI's Grok-3 (training complete, potential early rollout) or Google's Gemini 2.0 updates, if they top leaderboards before deadline, though historical slips temper expectations.
Experimental AI-generated summary referencing Polymarket data · Updated


Beware of external links.
Beware of external links.
Frequently Asked Questions