OpenAI's GPT-5.4 models currently score 41-44% on Humanity's Last Exam, a frontier benchmark of 2,500 expert-level questions across math, sciences, and humanities designed to resist saturation, trailing Google's Gemini 3.1 Pro Preview at 44-46% per leaderboards like Artificial Analysis and Scale Labs. This reflects rapid progress from single-digit accuracies in early 2025, driven by scaled reasoning capabilities in recent releases including GPT-5.4 (March 2026), GPT-5.4-Cyber (April 14), and GPT-Rosalind (April 16) for specialized domains. Trader sentiment hinges on whether iterative improvements or a GPT-5.5 successor can surpass key thresholds like 50% by June 30, amid competitive pressure from Anthropic's Claude Opus 4.x and xAI's Grok-4; watch for new evaluations at upcoming AI conferences or official submissions.
Riepilogo sperimentale generato dall'AI con riferimento ai dati di Polymarket. Questo non è un consiglio di trading e non ha alcun ruolo nella risoluzione di questo mercato. · AggiornatoPunteggio OpenAI GPT all'ultimo esame dell'umanità entro il 30 giugno?
Punteggio OpenAI GPT all'ultimo esame dell'umanità entro il 30 giugno?
$15,078 Vol.
50%+
53%
$15,078 Vol.
50%+
53%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Mercato aperto: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...OpenAI's GPT-5.4 models currently score 41-44% on Humanity's Last Exam, a frontier benchmark of 2,500 expert-level questions across math, sciences, and humanities designed to resist saturation, trailing Google's Gemini 3.1 Pro Preview at 44-46% per leaderboards like Artificial Analysis and Scale Labs. This reflects rapid progress from single-digit accuracies in early 2025, driven by scaled reasoning capabilities in recent releases including GPT-5.4 (March 2026), GPT-5.4-Cyber (April 14), and GPT-Rosalind (April 16) for specialized domains. Trader sentiment hinges on whether iterative improvements or a GPT-5.5 successor can surpass key thresholds like 50% by June 30, amid competitive pressure from Anthropic's Claude Opus 4.x and xAI's Grok-4; watch for new evaluations at upcoming AI conferences or official submissions.
Riepilogo sperimentale generato dall'AI con riferimento ai dati di Polymarket. Questo non è un consiglio di trading e non ha alcun ruolo nella risoluzione di questo mercato. · Aggiornato
Fai attenzione ai link esterni.
Fai attenzione ai link esterni.
Domande frequenti