OpenAI holds a commanding 94.5% implied probability of leading in AI math performance by March 31, propelled by its o1-preview and o1-mini models' dominance on rigorous benchmarks like MATH (94.8% accuracy) and AIME 2024 (83.3%), leveraging advanced chain-of-thought reasoning for graduate-level problem-solving that outpaces competitors. DeepSeek's strong but trailing scores (e.g., 79.8% on MATH) and others like Anthropic's Claude 3.5 Sonnet underscore trader consensus on OpenAI's edge amid no major rival breakthroughs. Realistic challenges include xAI's Grok-3 launch, Google's Gemini 2.0 updates, or surprise releases shifting leaderboard standings before resolution.
Experimental AI-generated summary referencing Polymarket data · UpdatedOpenAI 95%
DeepSeek 1.6%
xAI 1.6%
Anthropic 1.2%
$315,922 Vol.
$315,922 Vol.

OpenAI
95%

DeepSeek
2%

xAI
2%

Anthropic
1%

1%

Moonshot
1%

Z.ai
<1%

Alibaba
<1%

Mistral
<1%
OpenAI 95%
DeepSeek 1.6%
xAI 1.6%
Anthropic 1.2%
$315,922 Vol.
$315,922 Vol.

OpenAI
95%

DeepSeek
2%

xAI
2%

Anthropic
1%

1%

Moonshot
1%

Z.ai
<1%

Alibaba
<1%

Mistral
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Market Opened: Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...Resolver
0x2F5e3684c...OpenAI holds a commanding 94.5% implied probability of leading in AI math performance by March 31, propelled by its o1-preview and o1-mini models' dominance on rigorous benchmarks like MATH (94.8% accuracy) and AIME 2024 (83.3%), leveraging advanced chain-of-thought reasoning for graduate-level problem-solving that outpaces competitors. DeepSeek's strong but trailing scores (e.g., 79.8% on MATH) and others like Anthropic's Claude 3.5 Sonnet underscore trader consensus on OpenAI's edge amid no major rival breakthroughs. Realistic challenges include xAI's Grok-3 launch, Google's Gemini 2.0 updates, or surprise releases shifting leaderboard standings before resolution.
Experimental AI-generated summary referencing Polymarket data · Updated


Beware of external links.
Beware of external links.
Frequently Asked Questions