OpenAI's o1 models maintain an unchallenged lead in mathematical reasoning benchmarks, driving 98.2% market-implied odds for topping the leaderboard on March 31. Released in September 2024, o1-preview and o1-mini achieved breakthroughs like 94.8% on the MATH dataset and 83.3% on AIME 2024 via advanced chain-of-thought reasoning and test-time compute, outpacing rivals such as Anthropic's Claude 3.5 Sonnet (around 71% on AIME) and Google's Gemini 2.0. No credible announcements or leaks indicate competitors closing the gap before the deadline, with trader consensus reflecting real-money bets on sustained dominance amid quiet development cycles at other AI labs. Realistic challenges include a surprise model release from DeepMind or Anthropic with verified superior benchmarks, though historical launch timelines make this improbable in the next two weeks.
基於Polymarket數據的AI實驗性摘要 · 更新於OpenAI 98.2%
谷歌 <1%
Anthropic <1%
DeepSeek <1%
$413,708 交易量
$413,708 交易量

OpenAI
98%

谷歌
1%

Anthropic
1%

DeepSeek
<1%

xAI
<1%

Z.ai
<1%

阿里巴巴
<1%

Moonshot
<1%

Mistral
<1%
OpenAI 98.2%
谷歌 <1%
Anthropic <1%
DeepSeek <1%
$413,708 交易量
$413,708 交易量

OpenAI
98%

谷歌
1%

Anthropic
1%

DeepSeek
<1%

xAI
<1%

Z.ai
<1%

阿里巴巴
<1%

Moonshot
<1%

Mistral
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
市場開放時間: Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...Resolver
0x2F5e3684c...OpenAI's o1 models maintain an unchallenged lead in mathematical reasoning benchmarks, driving 98.2% market-implied odds for topping the leaderboard on March 31. Released in September 2024, o1-preview and o1-mini achieved breakthroughs like 94.8% on the MATH dataset and 83.3% on AIME 2024 via advanced chain-of-thought reasoning and test-time compute, outpacing rivals such as Anthropic's Claude 3.5 Sonnet (around 71% on AIME) and Google's Gemini 2.0. No credible announcements or leaks indicate competitors closing the gap before the deadline, with trader consensus reflecting real-money bets on sustained dominance amid quiet development cycles at other AI labs. Realistic challenges include a surprise model release from DeepMind or Anthropic with verified superior benchmarks, though historical launch timelines make this improbable in the next two weeks.
基於Polymarket數據的AI實驗性摘要 · 更新於
警惕外部連結哦。
警惕外部連結哦。
Frequently Asked Questions