OpenAI's GPT-5.4, released in early March 2026, scored 39.8% to 44.3% on Humanity's Last Exam—a frontier benchmark of 2,500 expert-level questions—without tools, marking an 8% gain over GPT-5.2 in two months and placing it competitively behind leaders like Anthropic's Claude Opus 4.6 (53.1%) and Google's Gemini 3.1 Pro (51.4%). This progress, amid iterative releases like GPT-5.3 Codex in February, fuels trader optimism for surpassing key thresholds by June 30, though exact resolution depends on evaluation configs. Upcoming catalysts include potential GPT-5.5 rollout, per OpenAI's fast cadence, versus risks of delays or benchmark contamination. Prediction markets reflect skin-in-the-game consensus on continued scaling toward expert human parity.
基於Polymarket數據的AI實驗性摘要 · 更新於50%以上
54%
$3,452 交易量
50%以上
54%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市場開放時間: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...OpenAI's GPT-5.4, released in early March 2026, scored 39.8% to 44.3% on Humanity's Last Exam—a frontier benchmark of 2,500 expert-level questions—without tools, marking an 8% gain over GPT-5.2 in two months and placing it competitively behind leaders like Anthropic's Claude Opus 4.6 (53.1%) and Google's Gemini 3.1 Pro (51.4%). This progress, amid iterative releases like GPT-5.3 Codex in February, fuels trader optimism for surpassing key thresholds by June 30, though exact resolution depends on evaluation configs. Upcoming catalysts include potential GPT-5.5 rollout, per OpenAI's fast cadence, versus risks of delays or benchmark contamination. Prediction markets reflect skin-in-the-game consensus on continued scaling toward expert human parity.
基於Polymarket數據的AI實驗性摘要 · 更新於
警惕外部連結哦。
警惕外部連結哦。
Frequently Asked Questions