OpenAI's GPT-5.4 Thinking xHigh Effort model commands a dominant 94.15% score on LiveBench's Mathematics Average leaderboard—the key resolution criterion for this market—more than three points ahead of Google's Gemini 3.1 Pro Preview at 91.04% and other OpenAI variants, driving the 99.5% trader consensus implied probability. This positioning stems from GPT-5.4's recent March release, which shattered records on rigorous math benchmarks like FrontierMath Tier 4 (38% solve rate on research-level problems) and MATH-500 (near-99%), outpacing rivals amid a competitive AI landscape featuring Anthropic's Claude 4.x and xAI's Grok. With resolution imminent on March 31, upset scenarios remain slim: a surprise model drop from Google or DeepSeek topping evals overnight, or an unexpected LiveBench refresh elevating a challenger, though no such catalysts have emerged in the past week.
Resumo experimental gerado por IA com dados do Polymarket · AtualizadoQual empresa terá o melhor modelo de IA para matemática em 31 de março?
Qual empresa terá o melhor modelo de IA para matemática em 31 de março?
OpenAI 99.5%
Google <1%
Anthropic <1%
Z.ai <1%
$487,181 Vol.
$487,181 Vol.

OpenAI
100%

<1%

Anthropic
<1%

Z.ai
<1%

DeepSeek
<1%

Mistral
<1%

Alibaba
<1%

xAI
<1%

Moonshot
<1%
OpenAI 99.5%
Google <1%
Anthropic <1%
Z.ai <1%
$487,181 Vol.
$487,181 Vol.

OpenAI
100%

<1%

Anthropic
<1%

Z.ai
<1%

DeepSeek
<1%

Mistral
<1%

Alibaba
<1%

xAI
<1%

Moonshot
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Mercado Aberto: Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Resolver
0x2F5e3684c...OpenAI's GPT-5.4 Thinking xHigh Effort model commands a dominant 94.15% score on LiveBench's Mathematics Average leaderboard—the key resolution criterion for this market—more than three points ahead of Google's Gemini 3.1 Pro Preview at 91.04% and other OpenAI variants, driving the 99.5% trader consensus implied probability. This positioning stems from GPT-5.4's recent March release, which shattered records on rigorous math benchmarks like FrontierMath Tier 4 (38% solve rate on research-level problems) and MATH-500 (near-99%), outpacing rivals amid a competitive AI landscape featuring Anthropic's Claude 4.x and xAI's Grok. With resolution imminent on March 31, upset scenarios remain slim: a surprise model drop from Google or DeepSeek topping evals overnight, or an unexpected LiveBench refresh elevating a challenger, though no such catalysts have emerged in the past week.
Resumo experimental gerado por IA com dados do Polymarket · Atualizado
Cuidado com os links externos.
Cuidado com os links externos.
Frequently Asked Questions