OpenAI's GPT-5.4 model, released March 5, 2026, has vaulted to the top of Humanity's Last Exam leaderboards with scores of 41.6-44.3% without tools—surpassing prior GPT-5 iterations and edging rivals like Google's Gemini 3.1 Pro at 44.7% and Anthropic's Claude Opus 4.6 at 34.4%. This frontier benchmark, comprising 2,500 expert-vetted questions across 100+ subjects, tests deep domain knowledge beyond saturated tests like MMLU. Rapid scaling in model parameters and reasoning chains explains the jump from GPT-5.2's ~28%, reflecting trader consensus on accelerating AI capabilities. With three months to June 30, upcoming releases like potential GPT-5.5 could push scores higher, though calibration errors signal persistent overconfidence risks amid competitive pressures from xAI and Meta.
基于Polymarket数据的AI实验性摘要 · 更新于50%+
31%
$0.00 交易量
50%+
31%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市场开放时间: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...OpenAI's GPT-5.4 model, released March 5, 2026, has vaulted to the top of Humanity's Last Exam leaderboards with scores of 41.6-44.3% without tools—surpassing prior GPT-5 iterations and edging rivals like Google's Gemini 3.1 Pro at 44.7% and Anthropic's Claude Opus 4.6 at 34.4%. This frontier benchmark, comprising 2,500 expert-vetted questions across 100+ subjects, tests deep domain knowledge beyond saturated tests like MMLU. Rapid scaling in model parameters and reasoning chains explains the jump from GPT-5.2's ~28%, reflecting trader consensus on accelerating AI capabilities. With three months to June 30, upcoming releases like potential GPT-5.5 could push scores higher, though calibration errors signal persistent overconfidence risks amid competitive pressures from xAI and Meta.
基于Polymarket数据的AI实验性摘要 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题