OpenAI's 93% implied probability for the best AI coding model by March 31 reflects trader consensus on its o1 series' benchmark dominance, including top scores on HumanEval for code generation and SWE-bench Verified for real-world software engineering tasks, where o1-preview excels in reasoning-heavy debugging over rivals like Claude 3.5 Sonnet. Recent o1-mini releases have further solidified this edge with efficient, high-accuracy coding at lower costs, amid rumors of full o1 or GPT-4.5 launches by Q1 2025. Challenges include Anthropic iterating to Claude 4, Google's Gemini 2.0 scaling compute for coding prowess, or DeepSeek's open-source advances surprising on leaderboards like LMSYS Arena's coding category.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jourOpenAI 94%
Anthropic 5.2%
Google <1%
DeepSeek <1%
$939,919 Vol.
$939,919 Vol.

OpenAI
94%

Anthropic
5%

1%

DeepSeek
<1%

xAI
<1%

Z.ai
<1%

Mistral
<1%

Alibaba
<1%

Moonshot
<1%
OpenAI 94%
Anthropic 5.2%
Google <1%
DeepSeek <1%
$939,919 Vol.
$939,919 Vol.

OpenAI
94%

Anthropic
5%

1%

DeepSeek
<1%

xAI
<1%

Z.ai
<1%

Mistral
<1%

Alibaba
<1%

Moonshot
<1%
If two models are tied for the top LiveBench coding average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “coding average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Marché ouvert : Dec 12, 2025, 1:29 PM ET
Resolver
0x2F5e3684c...Resolver
0x2F5e3684c...OpenAI's 93% implied probability for the best AI coding model by March 31 reflects trader consensus on its o1 series' benchmark dominance, including top scores on HumanEval for code generation and SWE-bench Verified for real-world software engineering tasks, where o1-preview excels in reasoning-heavy debugging over rivals like Claude 3.5 Sonnet. Recent o1-mini releases have further solidified this edge with efficient, high-accuracy coding at lower costs, amid rumors of full o1 or GPT-4.5 launches by Q1 2025. Challenges include Anthropic iterating to Claude 4, Google's Gemini 2.0 scaling compute for coding prowess, or DeepSeek's open-source advances surprising on leaderboards like LMSYS Arena's coding category.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jour
Méfiez-vous des liens externes.
Méfiez-vous des liens externes.
Questions fréquentes