Anthropic's Claude Opus 4.6 and Sonnet 4.6, released in February 2026, have propelled the company's frontier large language models to around 33-35% accuracy on Humanity's Last Exam—a rigorous multi-modal benchmark of 2,500 expert-level questions across over 100 subjects, finalized in April 2025. These scores mark a significant leap from prior Claude 4.5 models at under 14%, reflecting advances in extended reasoning and agentic capabilities amid intense competition from OpenAI's GPT-5 series (up to 42%) and Google's Gemini 3 Pro (around 38-45%). With no major updates in March, trader sentiment hinges on Anthropic's rapid iteration pace, potentially delivering Claude 4.7 or 5.0 before June 30 resolution, alongside evolving leaderboard methodologies distinguishing tool-assisted from standard evaluations. Key watch: official HLE dashboard submissions and developer conference announcements.
基於Polymarket數據的AI實驗性摘要 · 更新於$187,860 交易量
35%+
92%
45%+
41%
$187,860 交易量
35%+
92%
45%+
41%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市場開放時間: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Anthropic's Claude Opus 4.6 and Sonnet 4.6, released in February 2026, have propelled the company's frontier large language models to around 33-35% accuracy on Humanity's Last Exam—a rigorous multi-modal benchmark of 2,500 expert-level questions across over 100 subjects, finalized in April 2025. These scores mark a significant leap from prior Claude 4.5 models at under 14%, reflecting advances in extended reasoning and agentic capabilities amid intense competition from OpenAI's GPT-5 series (up to 42%) and Google's Gemini 3 Pro (around 38-45%). With no major updates in March, trader sentiment hinges on Anthropic's rapid iteration pace, potentially delivering Claude 4.7 or 5.0 before June 30 resolution, alongside evolving leaderboard methodologies distinguishing tool-assisted from standard evaluations. Key watch: official HLE dashboard submissions and developer conference announcements.
基於Polymarket數據的AI實驗性摘要 · 更新於
警惕外部連結哦。
警惕外部連結哦。
Frequently Asked Questions