Google's Gemini models currently lead the Humanity’s Last Exam (HLE) public evaluation leaderboard with Gemini 2.5 Pro Preview scoring 20.3% as of mid-June updates, reflecting incremental advances in reasoning capabilities amid fierce competition from OpenAI's o3 (20.3%) and Anthropic's Claude 4 Opus (17.4%). This benchmark, curated by AI safety experts with 100 ultra-hard questions spanning expert domains, remains unsolved by frontier large language models, underscoring persistent gaps in superhuman performance. Trader consensus implies skepticism for a major score leap by June 30, given typical model training cycles of months and no announced Gemini releases targeting HLE. Watch for Google DeepMind announcements at upcoming AI conferences or scaling runs that could shift dynamics.
Resumen experimental generado por IA con datos de Polymarket · Actualizado$132,621 Vol.
40%+
98%
45%+
83%
50%+
40%
55%+
16%
60%+
10%
$132,621 Vol.
40%+
98%
45%+
83%
50%+
40%
55%+
16%
60%+
10%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Mercado abierto: Jan 29, 2026, 12:50 PM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Google's Gemini models currently lead the Humanity’s Last Exam (HLE) public evaluation leaderboard with Gemini 2.5 Pro Preview scoring 20.3% as of mid-June updates, reflecting incremental advances in reasoning capabilities amid fierce competition from OpenAI's o3 (20.3%) and Anthropic's Claude 4 Opus (17.4%). This benchmark, curated by AI safety experts with 100 ultra-hard questions spanning expert domains, remains unsolved by frontier large language models, underscoring persistent gaps in superhuman performance. Trader consensus implies skepticism for a major score leap by June 30, given typical model training cycles of months and no announced Gemini releases targeting HLE. Watch for Google DeepMind announcements at upcoming AI conferences or scaling runs that could shift dynamics.
Resumen experimental generado por IA con datos de Polymarket · Actualizado
Cuidado con los enlaces externos.
Cuidado con los enlaces externos.
Preguntas frecuentes