OpenAI's GPT-5.4, released in early March 2026, has propelled the company's score on Humanity's Last Exam—a rigorous 2,500-question benchmark testing frontier large language model reasoning across 100+ expert-level subjects—to 41-44% without tools, per leaderboards from Scale Labs and Artificial Analysis. This marks an 8-10% gain over GPT-5.2's mid-30s within months, driven by enhanced chain-of-thought reasoning and scaling laws, though with tools it reaches 58%. Competitors like Google's Gemini 3.1 Pro Preview (46%) and Meta's Muse Spark (up to 50%) intensify pressure. Traders eye potential GPT-5.5 previews or updates by June 30 amid rapid iteration cycles, but benchmark revisions and evaluation variances add uncertainty to resolution thresholds.
Résumé expérimental généré par IA à partir des données Polymarket. Ceci n'est pas un conseil de trading et ne joue aucun rôle dans la résolution de ce marché. · Mis à jour50 %+
63%
$6,937 Vol.
50 %+
63%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Marché ouvert : Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...OpenAI's GPT-5.4, released in early March 2026, has propelled the company's score on Humanity's Last Exam—a rigorous 2,500-question benchmark testing frontier large language model reasoning across 100+ expert-level subjects—to 41-44% without tools, per leaderboards from Scale Labs and Artificial Analysis. This marks an 8-10% gain over GPT-5.2's mid-30s within months, driven by enhanced chain-of-thought reasoning and scaling laws, though with tools it reaches 58%. Competitors like Google's Gemini 3.1 Pro Preview (46%) and Meta's Muse Spark (up to 50%) intensify pressure. Traders eye potential GPT-5.5 previews or updates by June 30 amid rapid iteration cycles, but benchmark revisions and evaluation variances add uncertainty to resolution thresholds.
Résumé expérimental généré par IA à partir des données Polymarket. Ceci n'est pas un conseil de trading et ne joue aucun rôle dans la résolution de ce marché. · Mis à jour
Méfiez-vous des liens externes.
Méfiez-vous des liens externes.
Questions fréquentes