Recent Anthropic model iterations, including Claude Opus 4.6 and variants with extended thinking and tool use, have delivered frontier scores on Humanity’s Last Exam (HLE), a 2,500-question multidisciplinary benchmark of expert-level problems, reaching 34-53% accuracy depending on configuration. These results position Claude competitively against OpenAI’s GPT-5 series and Google’s Gemini previews amid rapid iteration cycles. Trader focus centers on whether further internal scaling, prompt optimizations, or a new release before June 30 can push scores past key thresholds like 45%. Historical patterns show frontier large language models improving several percentage points monthly on such benchmarks when new capabilities ship, though exact timelines remain uncertain.
Polymarket 데이터를 참조하는 실험적 AI 생성 요약입니다. 이것은 거래 조언이 아니며 이 마켓의 정산에 영향을 미치지 않습니다. · 업데이트$316,664 거래량
45% 이상
63%
50%+
36%
55%+
11%
$316,664 거래량
45% 이상
63%
50%+
36%
55%+
11%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
마켓 개설일: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Recent Anthropic model iterations, including Claude Opus 4.6 and variants with extended thinking and tool use, have delivered frontier scores on Humanity’s Last Exam (HLE), a 2,500-question multidisciplinary benchmark of expert-level problems, reaching 34-53% accuracy depending on configuration. These results position Claude competitively against OpenAI’s GPT-5 series and Google’s Gemini previews amid rapid iteration cycles. Trader focus centers on whether further internal scaling, prompt optimizations, or a new release before June 30 can push scores past key thresholds like 45%. Historical patterns show frontier large language models improving several percentage points monthly on such benchmarks when new capabilities ship, though exact timelines remain uncertain.
Polymarket 데이터를 참조하는 실험적 AI 생성 요약입니다. 이것은 거래 조언이 아니며 이 마켓의 정산에 영향을 미치지 않습니다. · 업데이트
외부 링크에 주의하세요.
외부 링크에 주의하세요.
자주 묻는 질문