OpenAI commands a 94.8% implied probability on Polymarket for the best AI model in coding by March 31, reflecting trader consensus around its o1 series' dominance on benchmarks like SWE-Bench Verified (leading at 48.9% resolution rate) and HumanEval, where advanced chain-of-thought reasoning excels in complex code generation and debugging over Anthropic's Claude 3.5 Sonnet (3.1%) or open-source challengers like DeepSeek. No major rival releases or benchmark upsets have emerged in the past 30 days, amid a lull following o1's September debut and Claude's October refinements. Realistic challenges include surprise model drops from Google DeepMind or xAI before deadline, regulatory scrutiny delaying deploys, or new evals exposing gaps in edge-case performance.
基于Polymarket数据的AI实验性摘要 · 更新于OpenAI 95.5%
Anthropic 2.4%
DeepSeek <1%
谷歌 <1%
$1,053,032 交易量
$1,053,032 交易量

OpenAI
96%

Anthropic
2%

DeepSeek
1%

谷歌
<1%

xAI
<1%

Z.ai
<1%

Mistral
<1%

阿里巴巴
<1%

Moonshot
<1%
OpenAI 95.5%
Anthropic 2.4%
DeepSeek <1%
谷歌 <1%
$1,053,032 交易量
$1,053,032 交易量

OpenAI
96%

Anthropic
2%

DeepSeek
1%

谷歌
<1%

xAI
<1%

Z.ai
<1%

Mistral
<1%

阿里巴巴
<1%

Moonshot
<1%
If two models are tied for the top LiveBench coding average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “coding average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
市场开放时间: Dec 12, 2025, 1:29 PM ET
Resolver
0x2F5e3684c...If two models are tied for the top LiveBench coding average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “coding average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Resolver
0x2F5e3684c...OpenAI commands a 94.8% implied probability on Polymarket for the best AI model in coding by March 31, reflecting trader consensus around its o1 series' dominance on benchmarks like SWE-Bench Verified (leading at 48.9% resolution rate) and HumanEval, where advanced chain-of-thought reasoning excels in complex code generation and debugging over Anthropic's Claude 3.5 Sonnet (3.1%) or open-source challengers like DeepSeek. No major rival releases or benchmark upsets have emerged in the past 30 days, amid a lull following o1's September debut and Claude's October refinements. Realistic challenges include surprise model drops from Google DeepMind or xAI before deadline, regulatory scrutiny delaying deploys, or new evals exposing gaps in edge-case performance.
基于Polymarket数据的AI实验性摘要 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题