OpenAI's GPT-5.4 Thinking xHigh Effort model commands a dominant 94.15% score in LiveBench's Mathematics Average category—the key resolution metric for this market—well ahead of Google's Gemini 3.1 Pro Preview at 91.04% and Anthropic's Claude 4.6 Opus at 89.32%, fueling trader consensus at 98.5% implied probability as of March 29. This lead stems from OpenAI's advanced chain-of-thought reasoning optimizations, demonstrated in recent LiveBench updates that refreshed math tasks from competitions and arXiv papers, maintaining contamination-free evaluations. With resolution just two days away on March 31, the market reflects low risk of displacement absent a surprise model release from rivals like Anthropic or DeepSeek, though a late-breaking benchmark surge from Google's Gemini series or xAI's Grok could challenge it if evaluations update favorably.
基于Polymarket数据的AI实验性摘要 · 更新于OpenAI 98.0%
谷歌 <1%
xAI <1%
DeepSeek <1%
$484,735 交易量
$484,735 交易量

OpenAI
98%

谷歌
<1%

xAI
<1%

DeepSeek
<1%

Anthropic
<1%

Z.ai
<1%

Mistral
<1%

阿里巴巴
<1%

Moonshot
<1%
OpenAI 98.0%
谷歌 <1%
xAI <1%
DeepSeek <1%
$484,735 交易量
$484,735 交易量

OpenAI
98%

谷歌
<1%

xAI
<1%

DeepSeek
<1%

Anthropic
<1%

Z.ai
<1%

Mistral
<1%

阿里巴巴
<1%

Moonshot
<1%
If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
市场开放时间: Dec 12, 2025, 1:25 PM ET
Resolver
0x2F5e3684c...If two models are tied for the highest LiveBench Mathematics Average score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order.
The primary source of resolution for this market will be LiveBench’s AI leaderboard, specifically the “Mathematics Average” category, found at livebench.ai. If this resolution source is unavailable at check time, this market will remain open until the LiveBench AI leaderboard comes back online and will resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.
Resolver
0x2F5e3684c...OpenAI's GPT-5.4 Thinking xHigh Effort model commands a dominant 94.15% score in LiveBench's Mathematics Average category—the key resolution metric for this market—well ahead of Google's Gemini 3.1 Pro Preview at 91.04% and Anthropic's Claude 4.6 Opus at 89.32%, fueling trader consensus at 98.5% implied probability as of March 29. This lead stems from OpenAI's advanced chain-of-thought reasoning optimizations, demonstrated in recent LiveBench updates that refreshed math tasks from competitions and arXiv papers, maintaining contamination-free evaluations. With resolution just two days away on March 31, the market reflects low risk of displacement absent a surprise model release from rivals like Anthropic or DeepSeek, though a late-breaking benchmark surge from Google's Gemini series or xAI's Grok could challenge it if evaluations update favorably.
基于Polymarket数据的AI实验性摘要 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题