Anthropic’s latest Claude Opus models, particularly the 4.8 variant using adaptive reasoning and maximum-effort modes, have reached 45.7% accuracy on Humanity’s Last Exam, a 2,500-question benchmark spanning advanced mathematics, sciences, and humanities designed to test frontier large language model capabilities. Successive releases incorporating extended thinking chains have driven steady leaderboard gains over the past several months, narrowing the gap with leaders like Gemini 3.1 Pro. With the June 30 resolution deadline approaching, any final model updates, fine-tunes, or verified submissions before that date represent the main near-term catalysts for shifts in trader consensus on whether Claude crosses key thresholds such as 45% or 50%.
Eksperymentalne podsumowanie AI odwołujące się do danych Polymarket. To nie jest porada handlowa i nie ma wpływu na rozstrzyganie tego rynku. · ZaktualizowanoClaude score on Humanity’s Last Exam by June 30?
$297,340 Wol.
45%+
53%
50%+
19%
55%+
7%
$297,340 Wol.
45%+
53%
50%+
19%
55%+
7%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Rynek otwarty: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Anthropic’s latest Claude Opus models, particularly the 4.8 variant using adaptive reasoning and maximum-effort modes, have reached 45.7% accuracy on Humanity’s Last Exam, a 2,500-question benchmark spanning advanced mathematics, sciences, and humanities designed to test frontier large language model capabilities. Successive releases incorporating extended thinking chains have driven steady leaderboard gains over the past several months, narrowing the gap with leaders like Gemini 3.1 Pro. With the June 30 resolution deadline approaching, any final model updates, fine-tunes, or verified submissions before that date represent the main near-term catalysts for shifts in trader consensus on whether Claude crosses key thresholds such as 45% or 50%.
Eksperymentalne podsumowanie AI odwołujące się do danych Polymarket. To nie jest porada handlowa i nie ma wpływu na rozstrzyganie tego rynku. · Zaktualizowano
Uważaj na linki zewnętrzne.
Uważaj na linki zewnętrzne.
Często zadawane pytania