OpenAI's o1 model dominates trader sentiment with 93.5% implied odds for the best AI math performance by March 31, propelled by its state-of-the-art results on rigorous benchmarks like the MATH dataset (94.8% accuracy) and AIME problems (83%), far surpassing competitors via superior chain-of-thought reasoning. Recent leaderboard dominance, solidified since o1's September 2024 launch, underpins this consensus, as no rival—DeepSeek's math-tuned models, Anthropic's Claude 3.5 Sonnet, or Google's Gemini—has closed the gap. Realistic challenges include pre-deadline releases like xAI's Grok-3, Moonshot's Kimi upgrades, or Mistral's next large model eclipsing o1 on eval suites such as GSM8K or GPQA, though timelines remain uncertain.
Polymarketデータを参照したAI生成の実験的な要約 · 更新日OpenAI 94%
DeepSeek 1.1%
アンソロピック 1.0%
xAI <1%
$283,877 Vol.
$283,877 Vol.

OpenAI
94%

DeepSeek
1%

アンソロピック
1%

xAI
1%

1%

ムーンショット
1%

Z.ai
<1%

アリババ
<1%

ミストラル
<1%
OpenAI 94%
DeepSeek 1.1%
アンソロピック 1.0%
xAI <1%
$283,877 Vol.
$283,877 Vol.

OpenAI
94%

DeepSeek
1%

アンソロピック
1%

xAI
1%

1%

ムーンショット
1%

Z.ai
<1%

アリババ
<1%

ミストラル
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
マーケット開始日: Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...Resolver
0x2F5e3684c...OpenAI's o1 model dominates trader sentiment with 93.5% implied odds for the best AI math performance by March 31, propelled by its state-of-the-art results on rigorous benchmarks like the MATH dataset (94.8% accuracy) and AIME problems (83%), far surpassing competitors via superior chain-of-thought reasoning. Recent leaderboard dominance, solidified since o1's September 2024 launch, underpins this consensus, as no rival—DeepSeek's math-tuned models, Anthropic's Claude 3.5 Sonnet, or Google's Gemini—has closed the gap. Realistic challenges include pre-deadline releases like xAI's Grok-3, Moonshot's Kimi upgrades, or Mistral's next large model eclipsing o1 on eval suites such as GSM8K or GPQA, though timelines remain uncertain.
Polymarketデータを参照したAI生成の実験的な要約 · 更新日
外部リンクに注意してください。
外部リンクに注意してください。
よくある質問