Recent Anthropic model iterations, including Claude Opus 4.6 and variants with extended thinking and tool use, have delivered frontier scores on Humanity’s Last Exam (HLE), a 2,500-question multidisciplinary benchmark of expert-level problems, reaching 34-53% accuracy depending on configuration. These results position Claude competitively against OpenAI’s GPT-5 series and Google’s Gemini previews amid rapid iteration cycles. Trader focus centers on whether further internal scaling, prompt optimizations, or a new release before June 30 can push scores past key thresholds like 45%. Historical patterns show frontier large language models improving several percentage points monthly on such benchmarks when new capabilities ship, though exact timelines remain uncertain.
Riepilogo sperimentale generato dall'AI con riferimento ai dati di Polymarket. Questo non è un consiglio di trading e non ha alcun ruolo nella risoluzione di questo mercato. · AggiornatoPunteggio di Claude all'ultimo esame dell'umanità entro il 30 giugno?
$316,664 Vol.
45%+
63%
50%+
36%
55%+
11%
$316,664 Vol.
45%+
63%
50%+
36%
55%+
11%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Mercato aperto: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Recent Anthropic model iterations, including Claude Opus 4.6 and variants with extended thinking and tool use, have delivered frontier scores on Humanity’s Last Exam (HLE), a 2,500-question multidisciplinary benchmark of expert-level problems, reaching 34-53% accuracy depending on configuration. These results position Claude competitively against OpenAI’s GPT-5 series and Google’s Gemini previews amid rapid iteration cycles. Trader focus centers on whether further internal scaling, prompt optimizations, or a new release before June 30 can push scores past key thresholds like 45%. Historical patterns show frontier large language models improving several percentage points monthly on such benchmarks when new capabilities ship, though exact timelines remain uncertain.
Riepilogo sperimentale generato dall'AI con riferimento ai dati di Polymarket. Questo non è un consiglio di trading e non ha alcun ruolo nella risoluzione di questo mercato. · Aggiornato
Fai attenzione ai link esterni.
Fai attenzione ai link esterni.
Domande frequenti