Google's Gemini models currently lead the Humanity’s Last Exam (HLE) public evaluation leaderboard with Gemini 2.5 Pro Preview scoring 20.3% as of mid-June updates, reflecting incremental advances in reasoning capabilities amid fierce competition from OpenAI's o3 (20.3%) and Anthropic's Claude 4 Opus (17.4%). This benchmark, curated by AI safety experts with 100 ultra-hard questions spanning expert domains, remains unsolved by frontier large language models, underscoring persistent gaps in superhuman performance. Trader consensus implies skepticism for a major score leap by June 30, given typical model training cycles of months and no announced Gemini releases targeting HLE. Watch for Google DeepMind announcements at upcoming AI conferences or scaling runs that could shift dynamics.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jour$132,621 Vol.
40 %+
98%
45%+
83%
50 %+
40%
55 %+
16%
60 %+
10%
$132,621 Vol.
40 %+
98%
45%+
83%
50 %+
40%
55 %+
16%
60 %+
10%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Marché ouvert : Jan 29, 2026, 12:50 PM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Google's Gemini models currently lead the Humanity’s Last Exam (HLE) public evaluation leaderboard with Gemini 2.5 Pro Preview scoring 20.3% as of mid-June updates, reflecting incremental advances in reasoning capabilities amid fierce competition from OpenAI's o3 (20.3%) and Anthropic's Claude 4 Opus (17.4%). This benchmark, curated by AI safety experts with 100 ultra-hard questions spanning expert domains, remains unsolved by frontier large language models, underscoring persistent gaps in superhuman performance. Trader consensus implies skepticism for a major score leap by June 30, given typical model training cycles of months and no announced Gemini releases targeting HLE. Watch for Google DeepMind announcements at upcoming AI conferences or scaling runs that could shift dynamics.
Résumé expérimental généré par IA à partir des données Polymarket · Mis à jour
Méfiez-vous des liens externes.
Méfiez-vous des liens externes.
Questions fréquentes