OpenAI commands a 94.8% implied probability on Polymarket for the best AI model in coding by March 31, reflecting trader consensus around its o1 series' dominance on benchmarks like SWE-Bench Verified (leading at 48.9% resolution rate) and HumanEval, where advanced chain-of-thought reasoning excels in complex code generation and debugging over Anthropic's Claude 3.5 Sonnet (3.1%) or open-source challengers like DeepSeek. No major rival releases or benchmark upsets have emerged in the past 30 days, amid a lull following o1's September debut and Claude's October refinements. Realistic challenges include surprise model drops from Google DeepMind or xAI before deadline, regulatory scrutiny delaying deploys, or new evals exposing gaps in edge-case performance.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jourOpenAI 96.2%
Anthropic 1.1%
DeepSeek <1%
Google <1%
$1,053,601 Vol.
$1,053,601 Vol.

OpenAI
96%

Anthropic
1%

DeepSeek
1%

<1%

xAI
<1%

Z.ai
<1%

Mistral
<1%

Alibaba
<1%

Moonshot
<1%
OpenAI 96.2%
Anthropic 1.1%
DeepSeek <1%
Google <1%
$1,053,601 Vol.
$1,053,601 Vol.

OpenAI
96%

Anthropic
1%

DeepSeek
1%

<1%

xAI
<1%

Z.ai
<1%

Mistral
<1%

Alibaba
<1%

Moonshot
<1%
If two models are tied for the top LiveBench coding average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “coding average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Marché ouvert : Dec 12, 2025, 1:29 PM ET
Resolver
0x2F5e3684c...If two models are tied for the top LiveBench coding average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “coding average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Resolver
0x2F5e3684c...OpenAI commands a 94.8% implied probability on Polymarket for the best AI model in coding by March 31, reflecting trader consensus around its o1 series' dominance on benchmarks like SWE-Bench Verified (leading at 48.9% resolution rate) and HumanEval, where advanced chain-of-thought reasoning excels in complex code generation and debugging over Anthropic's Claude 3.5 Sonnet (3.1%) or open-source challengers like DeepSeek. No major rival releases or benchmark upsets have emerged in the past 30 days, amid a lull following o1's September debut and Claude's October refinements. Realistic challenges include surprise model drops from Google DeepMind or xAI before deadline, regulatory scrutiny delaying deploys, or new evals exposing gaps in edge-case performance.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jour
Méfiez-vous des liens externes.
Méfiez-vous des liens externes.
Questions fréquentes