OpenAI's release of GPT-5.4 on March 5, 2026, drove its score to 39.8% on Humanity's Last Exam without tools—rising to 52.1% with tools—and 41.6% in xhigh configuration on leaderboards, vaulting it near the top ahead of prior GPT-5.2's 34.5%. This 8%+ leap reflects enhanced reasoning and knowledge synthesis in the 2,500-question frontier benchmark spanning expert-level math, science, and humanities, though still below human experts. Competitive pressure mounts from Google's Gemini 3.1 Pro Preview at 44.7%, fueling trader focus on OpenAI's rapid iteration cycle. A GPT-5.5 or successor could push past key thresholds by June 30, but unconfirmed timelines, eval variances, and benchmark saturation risks temper expectations amid accelerating AI capability races.
基於Polymarket數據的AI實驗性摘要 · 更新於50%以上
50%
$3,452 交易量
50%以上
50%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市場開放時間: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...OpenAI's release of GPT-5.4 on March 5, 2026, drove its score to 39.8% on Humanity's Last Exam without tools—rising to 52.1% with tools—and 41.6% in xhigh configuration on leaderboards, vaulting it near the top ahead of prior GPT-5.2's 34.5%. This 8%+ leap reflects enhanced reasoning and knowledge synthesis in the 2,500-question frontier benchmark spanning expert-level math, science, and humanities, though still below human experts. Competitive pressure mounts from Google's Gemini 3.1 Pro Preview at 44.7%, fueling trader focus on OpenAI's rapid iteration cycle. A GPT-5.5 or successor could push past key thresholds by June 30, but unconfirmed timelines, eval variances, and benchmark saturation risks temper expectations amid accelerating AI capability races.
基於Polymarket數據的AI實驗性摘要 · 更新於
警惕外部連結哦。
警惕外部連結哦。
Frequently Asked Questions