Google's Gemini models currently lead the Humanity’s Last Exam (HLE) public evaluation leaderboard with Gemini 2.5 Pro Preview scoring 20.3% as of mid-June updates, reflecting incremental advances in reasoning capabilities amid fierce competition from OpenAI's o3 (20.3%) and Anthropic's Claude 4 Opus (17.4%). This benchmark, curated by AI safety experts with 100 ultra-hard questions spanning expert domains, remains unsolved by frontier large language models, underscoring persistent gaps in superhuman performance. Trader consensus implies skepticism for a major score leap by June 30, given typical model training cycles of months and no announced Gemini releases targeting HLE. Watch for Google DeepMind announcements at upcoming AI conferences or scaling runs that could shift dynamics.
Polymarketデータを参照したAI生成の実験的な要約 · 更新日$132,621 Vol.
40%以上
98%
45%以上
83%
50%以上
40%
55%以上
16%
60%以上
10%
$132,621 Vol.
40%以上
98%
45%以上
83%
50%以上
40%
55%以上
16%
60%以上
10%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
マーケット開始日: Jan 29, 2026, 12:50 PM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Google's Gemini models currently lead the Humanity’s Last Exam (HLE) public evaluation leaderboard with Gemini 2.5 Pro Preview scoring 20.3% as of mid-June updates, reflecting incremental advances in reasoning capabilities amid fierce competition from OpenAI's o3 (20.3%) and Anthropic's Claude 4 Opus (17.4%). This benchmark, curated by AI safety experts with 100 ultra-hard questions spanning expert domains, remains unsolved by frontier large language models, underscoring persistent gaps in superhuman performance. Trader consensus implies skepticism for a major score leap by June 30, given typical model training cycles of months and no announced Gemini releases targeting HLE. Watch for Google DeepMind announcements at upcoming AI conferences or scaling runs that could shift dynamics.
Polymarketデータを参照したAI生成の実験的な要約 · 更新日
外部リンクに注意してください。
外部リンクに注意してください。
よくある質問