OpenAI's o1 reasoning models, released in September 2024, have established a commanding lead in AI math capabilities, achieving state-of-the-art scores like 94.8% on the MATH benchmark and 83.3% on AIME—far surpassing competitors such as Google's Gemini 1.5 (92.5% MATH) and Anthropic's Claude 3.5 Sonnet (88.8%). This technical dominance, demonstrated through chain-of-thought reasoning advancements, underpins the 98.6% trader consensus, reflecting aggregated skin-in-the-game bets on sustained superiority through March 31, 2025. Challenges could arise from imminent releases like Google's Gemini 2.0 (expected December) or DeepSeek-V3, if they exceed o1 on key math evaluations, though historical launch delays and benchmark volatility temper such risks. Traders eye LMSYS Arena math leaderboards and independent evals as pivotal near-term catalysts.
Resumo experimental gerado por IA com dados do Polymarket · AtualizadoQual empresa terá o melhor modelo de IA para matemática em 31 de março?
Qual empresa terá o melhor modelo de IA para matemática em 31 de março?
OpenAI 98.6%
Google <1%
DeepSeek <1%
Anthropic <1%
$444,107 Vol.
$444,107 Vol.

OpenAI
99%

1%

DeepSeek
<1%

Anthropic
<1%

xAI
<1%

Moonshot
<1%

Z.ai
<1%

Mistral
<1%

Alibaba
<1%
OpenAI 98.6%
Google <1%
DeepSeek <1%
Anthropic <1%
$444,107 Vol.
$444,107 Vol.

OpenAI
99%

1%

DeepSeek
<1%

Anthropic
<1%

xAI
<1%

Moonshot
<1%

Z.ai
<1%

Mistral
<1%

Alibaba
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Mercado Aberto: Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Resolver
0x2F5e3684c...OpenAI's o1 reasoning models, released in September 2024, have established a commanding lead in AI math capabilities, achieving state-of-the-art scores like 94.8% on the MATH benchmark and 83.3% on AIME—far surpassing competitors such as Google's Gemini 1.5 (92.5% MATH) and Anthropic's Claude 3.5 Sonnet (88.8%). This technical dominance, demonstrated through chain-of-thought reasoning advancements, underpins the 98.6% trader consensus, reflecting aggregated skin-in-the-game bets on sustained superiority through March 31, 2025. Challenges could arise from imminent releases like Google's Gemini 2.0 (expected December) or DeepSeek-V3, if they exceed o1 on key math evaluations, though historical launch delays and benchmark volatility temper such risks. Traders eye LMSYS Arena math leaderboards and independent evals as pivotal near-term catalysts.
Resumo experimental gerado por IA com dados do Polymarket · Atualizado
Cuidado com os links externos.
Cuidado com os links externos.
Frequently Asked Questions