Anthropic's latest Claude variants, including Fable 5 and Opus 4.8 with adaptive reasoning and max-effort modes, currently lead or rank near the top of Humanity's Last Exam (HLE) leaderboards at 45–53% accuracy, ahead of Gemini 3.1 Pro and GPT-5.4 series models. This positioning stems from 2026 releases emphasizing deeper chain-of-thought reasoning, tool integration, and targeted training on expert-level academic tasks, building on earlier gains from single-digit scores in 2025. With the June 30 deadline approaching, traders focus on whether Anthropic will release or optimize another iteration before then, as even modest gains in calibration or multi-step problem-solving could solidify or extend its edge on the 2,500-question benchmark. Competitive pressure from Google and OpenAI updates remains a key swing factor in this narrow window.
Ringkasan eksperimental yang dihasilkan AI dengan referensi data Polymarket. Ini bukan saran trading dan tidak berperan dalam bagaimana pasar ini diselesaikan. · Diperbarui$362,082 Vol.
45%+
47%
50%+
26%
55%+
12%
$362,082 Vol.
45%+
47%
50%+
26%
55%+
12%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Pasar Dibuka: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Anthropic's latest Claude variants, including Fable 5 and Opus 4.8 with adaptive reasoning and max-effort modes, currently lead or rank near the top of Humanity's Last Exam (HLE) leaderboards at 45–53% accuracy, ahead of Gemini 3.1 Pro and GPT-5.4 series models. This positioning stems from 2026 releases emphasizing deeper chain-of-thought reasoning, tool integration, and targeted training on expert-level academic tasks, building on earlier gains from single-digit scores in 2025. With the June 30 deadline approaching, traders focus on whether Anthropic will release or optimize another iteration before then, as even modest gains in calibration or multi-step problem-solving could solidify or extend its edge on the 2,500-question benchmark. Competitive pressure from Google and OpenAI updates remains a key swing factor in this narrow window.
Ringkasan eksperimental yang dihasilkan AI dengan referensi data Polymarket. Ini bukan saran trading dan tidak berperan dalam bagaimana pasar ini diselesaikan. · Diperbarui
Hati-hati dengan link eksternal.
Hati-hati dengan link eksternal.
Pertanyaan yang Sering Diajukan