OpenAI's o1 series commands a 94.5% implied probability of having the top math-performing AI model by March 31, driven by its unchallenged lead on benchmarks like MATH (83.3% accuracy) and AIME 2024 (high-90s scores), thanks to advanced chain-of-thought reasoning that excels in complex problem-solving. Recent evals confirm o1-pro outperforming rivals like DeepSeek-R1 (strong but trailing at ~70% on MATH) and Anthropic's Claude 3.5 Sonnet, with no verified announcements of superior pre-March 31 releases from competitors. Trader consensus reflects this edge amid quiet development cycles. Challenges could arise from surprise launches, such as xAI's Grok-3 or Google's Gemini 2.0 flash updates, if they demonstrably surpass o1 on independent leaderboards.
Experimental AI-generated summary referencing Polymarket data · UpdatedOpenAI 95%
DeepSeek 1.6%
xAI 1.6%
Anthropic 1.2%
$315,922 Vol.
$315,922 Vol.

OpenAI
95%

DeepSeek
2%

xAI
2%

Anthropic
1%

1%

Moonshot
1%

Z.ai
<1%

Alibaba
<1%

Mistral
<1%
OpenAI 95%
DeepSeek 1.6%
xAI 1.6%
Anthropic 1.2%
$315,922 Vol.
$315,922 Vol.

OpenAI
95%

DeepSeek
2%

xAI
2%

Anthropic
1%

1%

Moonshot
1%

Z.ai
<1%

Alibaba
<1%

Mistral
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Market Opened: Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...Resolver
0x2F5e3684c...OpenAI's o1 series commands a 94.5% implied probability of having the top math-performing AI model by March 31, driven by its unchallenged lead on benchmarks like MATH (83.3% accuracy) and AIME 2024 (high-90s scores), thanks to advanced chain-of-thought reasoning that excels in complex problem-solving. Recent evals confirm o1-pro outperforming rivals like DeepSeek-R1 (strong but trailing at ~70% on MATH) and Anthropic's Claude 3.5 Sonnet, with no verified announcements of superior pre-March 31 releases from competitors. Trader consensus reflects this edge amid quiet development cycles. Challenges could arise from surprise launches, such as xAI's Grok-3 or Google's Gemini 2.0 flash updates, if they demonstrably surpass o1 on independent leaderboards.
Experimental AI-generated summary referencing Polymarket data · Updated


Beware of external links.
Beware of external links.
Frequently Asked Questions