OpenAI's 91% implied probability dominates trader sentiment for the best AI coding model by March 31, fueled by o1-preview's elite performance on reasoning-intensive benchmarks like HumanEval and LiveCodeBench, where its chain-of-thought approach excels in complex, multi-step programming tasks. This edge, demonstrated in official evals showing PhD-level coding proficiency, underpins confidence in OpenAI's scaling trajectory amid rumors of o1 full release or GPT-5 iterations pushing SWE-Bench scores beyond 40%. While verifiable leaderboards currently favor Anthropic's Claude 3.5 Sonnet, realistic challengers include a Claude 4 launch, Google's Gemini 2.0 at I/O, or xAI's Grok-3 if they surpass o1 on public coding evals before the deadline.
基於Polymarket數據的AI實驗性摘要 · 更新於OpenAI 91%
Anthropic 6.8%
Google 1.3%
DeepSeek <1%
$963,526 交易量
$963,526 交易量

OpenAI
91%

Anthropic
7%

1%

DeepSeek
<1%

xAI
<1%

Z.ai
<1%

Mistral
<1%

阿里巴巴
<1%

Moonshot
<1%
OpenAI 91%
Anthropic 6.8%
Google 1.3%
DeepSeek <1%
$963,526 交易量
$963,526 交易量

OpenAI
91%

Anthropic
7%

1%

DeepSeek
<1%

xAI
<1%

Z.ai
<1%

Mistral
<1%

阿里巴巴
<1%

Moonshot
<1%
If two models are tied for the top LiveBench coding average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “coding average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
市場開放時間: Dec 12, 2025, 1:29 PM ET
Resolver
0x2F5e3684c...Resolver
0x2F5e3684c...OpenAI's 91% implied probability dominates trader sentiment for the best AI coding model by March 31, fueled by o1-preview's elite performance on reasoning-intensive benchmarks like HumanEval and LiveCodeBench, where its chain-of-thought approach excels in complex, multi-step programming tasks. This edge, demonstrated in official evals showing PhD-level coding proficiency, underpins confidence in OpenAI's scaling trajectory amid rumors of o1 full release or GPT-5 iterations pushing SWE-Bench scores beyond 40%. While verifiable leaderboards currently favor Anthropic's Claude 3.5 Sonnet, realistic challengers include a Claude 4 launch, Google's Gemini 2.0 at I/O, or xAI's Grok-3 if they surpass o1 on public coding evals before the deadline.
基於Polymarket數據的AI實驗性摘要 · 更新於
警惕外部連結哦。
警惕外部連結哦。
Frequently Asked Questions