Trader sentiment on Anthropic's Claude achieving a breakthrough score on Humanity’s Last Exam—a rigorous benchmark from the Center for AI Safety, METR, and Scale AI testing frontier large language model reasoning across 2,500 expert-level questions, where humans average ~50%—centers on current low performance and uncertain scaling timelines. Claude 3.5 Sonnet, released June 20, 2024, scores just 9% on the public eval set, trailing slightly behind OpenAI's o1-preview at 12%, reflecting no model yet cracking multi-hop reasoning barriers despite rapid benchmark gains elsewhere. No fresh Anthropic announcements target this exam specifically in recent weeks, but traders eye potential Claude 4 training updates or demos by year-end, alongside competitive pressure from Google DeepMind and xAI, with June 30 marking a key resolution deadline amid AI safety debates.
基于Polymarket数据的AI实验性摘要 · 更新于$186,468 交易量
35%+
93%
45%以上
36%
$186,468 交易量
35%+
93%
45%以上
36%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市场开放时间: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...Resolver
0x65070BE91...Trader sentiment on Anthropic's Claude achieving a breakthrough score on Humanity’s Last Exam—a rigorous benchmark from the Center for AI Safety, METR, and Scale AI testing frontier large language model reasoning across 2,500 expert-level questions, where humans average ~50%—centers on current low performance and uncertain scaling timelines. Claude 3.5 Sonnet, released June 20, 2024, scores just 9% on the public eval set, trailing slightly behind OpenAI's o1-preview at 12%, reflecting no model yet cracking multi-hop reasoning barriers despite rapid benchmark gains elsewhere. No fresh Anthropic announcements target this exam specifically in recent weeks, but traders eye potential Claude 4 training updates or demos by year-end, alongside competitive pressure from Google DeepMind and xAI, with June 30 marking a key resolution deadline amid AI safety debates.
基于Polymarket数据的AI实验性摘要 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题