Google's Gemini 3 Deep Think achieved a landmark 48.4% on Humanity's Last Exam—a frontier benchmark of 2,500 expert-level questions across math, science, and humanities—without tools, as announced in February 2026, surpassing prior Gemini 3 Pro scores of 37-38% from late 2025. Independent leaderboards like Artificial Analysis now show Gemini 3.1 Pro Preview at 44.7%, edging GPT-5.4's 41.6% amid intensifying rivalry with Anthropic's Claude Opus 4.6 (53.1%) and xAI's Grok-4. Trader consensus reflects rapid scaling in AI reasoning capabilities, but verification discrepancies and benchmark calibration issues temper optimism. Google I/O in May could reveal Gemini 4 advancements, pivotal before the June 30 cutoff, though delays in model releases remain a key risk.
基于Polymarket数据的AI实验性摘要 · 更新于$264,259 交易量
40%+
95%
45%及以上
83%
50%及以上
39%
55%及以上
17%
60%以上
10%
$264,259 交易量
40%+
95%
45%及以上
83%
50%及以上
39%
55%及以上
17%
60%以上
10%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市场开放时间: Jan 29, 2026, 12:50 PM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Google's Gemini 3 Deep Think achieved a landmark 48.4% on Humanity's Last Exam—a frontier benchmark of 2,500 expert-level questions across math, science, and humanities—without tools, as announced in February 2026, surpassing prior Gemini 3 Pro scores of 37-38% from late 2025. Independent leaderboards like Artificial Analysis now show Gemini 3.1 Pro Preview at 44.7%, edging GPT-5.4's 41.6% amid intensifying rivalry with Anthropic's Claude Opus 4.6 (53.1%) and xAI's Grok-4. Trader consensus reflects rapid scaling in AI reasoning capabilities, but verification discrepancies and benchmark calibration issues temper optimism. Google I/O in May could reveal Gemini 4 advancements, pivotal before the June 30 cutoff, though delays in model releases remain a key risk.
基于Polymarket数据的AI实验性摘要 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题