OpenAI's latest GPT-5.4 model holds a leading position on the Humanity's Last Exam benchmark at approximately 41.6% accuracy without tools as of late March 2026, driving trader consensus toward moderate score ranges amid rapid frontier AI progress. This reflects a near-doubling of performance in under a year from GPT-4o's initial 2.7%, fueled by advances in reasoning chains and extended thinking modes, though scores remain below expert human baselines around 90% in specialized domains. Competitive pressure from Google's Gemini 3 Pro (39.9%) and Anthropic's Claude Opus 4.6 (34.4%) underscores the tight race, with tool-augmented scores pushing GPT-5.4 pro to 58.7%. Traders eye potential GPT-5.5 or successor releases before June 30, alongside developer previews or earnings calls that could signal capability leaps, but historical delays and benchmark calibration issues introduce uncertainty in scaling to higher thresholds.
Resumen experimental generado por IA con datos de Polymarket · Actualizado50%+
31%
$3,406 Vol.
50%+
31%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Mercado abierto: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...OpenAI's latest GPT-5.4 model holds a leading position on the Humanity's Last Exam benchmark at approximately 41.6% accuracy without tools as of late March 2026, driving trader consensus toward moderate score ranges amid rapid frontier AI progress. This reflects a near-doubling of performance in under a year from GPT-4o's initial 2.7%, fueled by advances in reasoning chains and extended thinking modes, though scores remain below expert human baselines around 90% in specialized domains. Competitive pressure from Google's Gemini 3 Pro (39.9%) and Anthropic's Claude Opus 4.6 (34.4%) underscores the tight race, with tool-augmented scores pushing GPT-5.4 pro to 58.7%. Traders eye potential GPT-5.5 or successor releases before June 30, alongside developer previews or earnings calls that could signal capability leaps, but historical delays and benchmark calibration issues introduce uncertainty in scaling to higher thresholds.
Resumen experimental generado por IA con datos de Polymarket · Actualizado
Cuidado con los enlaces externos.
Cuidado con los enlaces externos.
Preguntas frecuentes