Google's Gemini 3.1 Pro Preview currently leads the Humanity's Last Exam (HLE) leaderboard with scores around 45% on this frontier benchmark of 2,500 PhD-level questions testing advanced reasoning across math, science, and humanities, per independent evaluations like Artificial Analysis and Scale Labs. This follows February 2026's Gemini 3 Deep Think upgrade, which Google claims achieved 48.4% without tools, outpacing rivals like OpenAI's GPT-5.4 (41-44%) and Anthropic's Claude Opus 4.6 (34%). Rapid scaling via DeepMind's multimodal large language models has narrowed the gap to human expert performance, but calibration errors remain high, signaling overconfidence risks. Traders eye Google I/O in May for Gemini 4 previews or further optimizations before the June 30 deadline, amid intensifying AI capability races.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jour$132,621 Vol.
40 %+
98%
45%+
83%
50 %+
40%
55 %+
16%
60 %+
9%
$132,621 Vol.
40 %+
98%
45%+
83%
50 %+
40%
55 %+
16%
60 %+
9%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Marché ouvert : Jan 29, 2026, 12:50 PM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Google's Gemini 3.1 Pro Preview currently leads the Humanity's Last Exam (HLE) leaderboard with scores around 45% on this frontier benchmark of 2,500 PhD-level questions testing advanced reasoning across math, science, and humanities, per independent evaluations like Artificial Analysis and Scale Labs. This follows February 2026's Gemini 3 Deep Think upgrade, which Google claims achieved 48.4% without tools, outpacing rivals like OpenAI's GPT-5.4 (41-44%) and Anthropic's Claude Opus 4.6 (34%). Rapid scaling via DeepMind's multimodal large language models has narrowed the gap to human expert performance, but calibration errors remain high, signaling overconfidence risks. Traders eye Google I/O in May for Gemini 4 previews or further optimizations before the June 30 deadline, amid intensifying AI capability races.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jour
Méfiez-vous des liens externes.
Méfiez-vous des liens externes.
Questions fréquentes