OpenAI dominates trader sentiment with a 92% implied probability of fielding the best AI coding model by March 31, propelled by its o1-preview's record 48.9% score on SWE-Bench Verified—the most rigorous benchmark for real-world software engineering tasks—surpassing rivals like Anthropic's Claude 3.5 Sonnet at 49% under optimized conditions but lagging in raw agentic reasoning. OpenAI's rapid iteration cycle, vast compute resources, and developer preference for o1's chain-of-thought prowess in complex coding underpin this consensus, with expectations for o3 or GPT-5 equivalents soon. Challenges could arise if Anthropic unleashes Claude 4 or Google advances Gemini 2.0 with superior fine-tuning, though historical precedents favor OpenAI's edge in frontier capabilities.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jourOpenAI 89%
Anthropic 6.5%
Google 1.1%
DeepSeek <1%
$984,693 Vol.
$984,693 Vol.

OpenAI
89%

Anthropic
7%

1%

DeepSeek
<1%

Z.ai
<1%

Mistral
<1%

Alibaba
<1%

xAI
<1%

Moonshot
<1%
OpenAI 89%
Anthropic 6.5%
Google 1.1%
DeepSeek <1%
$984,693 Vol.
$984,693 Vol.

OpenAI
89%

Anthropic
7%

1%

DeepSeek
<1%

Z.ai
<1%

Mistral
<1%

Alibaba
<1%

xAI
<1%

Moonshot
<1%
If two models are tied for the top LiveBench coding average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “coding average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Marché ouvert : Dec 12, 2025, 1:29 PM ET
Resolver
0x2F5e3684c...Resolver
0x2F5e3684c...OpenAI dominates trader sentiment with a 92% implied probability of fielding the best AI coding model by March 31, propelled by its o1-preview's record 48.9% score on SWE-Bench Verified—the most rigorous benchmark for real-world software engineering tasks—surpassing rivals like Anthropic's Claude 3.5 Sonnet at 49% under optimized conditions but lagging in raw agentic reasoning. OpenAI's rapid iteration cycle, vast compute resources, and developer preference for o1's chain-of-thought prowess in complex coding underpin this consensus, with expectations for o3 or GPT-5 equivalents soon. Challenges could arise if Anthropic unleashes Claude 4 or Google advances Gemini 2.0 with superior fine-tuning, though historical precedents favor OpenAI's edge in frontier capabilities.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jour
Méfiez-vous des liens externes.
Méfiez-vous des liens externes.
Questions fréquentes