Anthropic's recent Claude Opus 4.7 and 4.6 releases, which incorporate extended reasoning modes, have driven scores on Humanity’s Last Exam into the mid-to-high 30s on the Scale AI leaderboard, narrowing the gap with leaders like Gemini 3.1 Pro Preview at around 44%. This progress stems from targeted improvements in handling graduate-level questions across math, science, and humanities, where earlier Claude versions scored below 20%. Competitive pressure from OpenAI’s GPT-5 series and Google’s Gemini updates continues to accelerate capability gains, though models still trail expert human performance near 90%. With the June 30 deadline weeks away, any surprise model iteration or benchmark-optimized training run could shift outcomes, while current trajectories suggest incremental rather than breakthrough gains remain most likely.
基于Polymarket数据的AI实验性摘要。这不是交易建议,也不影响该市场的结算方式。 · 更新于$283,609 交易量
45%以上
20%
50%+
9%
55%以上
4%
$283,609 交易量
45%以上
20%
50%+
9%
55%以上
4%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市场开放时间: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Anthropic's recent Claude Opus 4.7 and 4.6 releases, which incorporate extended reasoning modes, have driven scores on Humanity’s Last Exam into the mid-to-high 30s on the Scale AI leaderboard, narrowing the gap with leaders like Gemini 3.1 Pro Preview at around 44%. This progress stems from targeted improvements in handling graduate-level questions across math, science, and humanities, where earlier Claude versions scored below 20%. Competitive pressure from OpenAI’s GPT-5 series and Google’s Gemini updates continues to accelerate capability gains, though models still trail expert human performance near 90%. With the June 30 deadline weeks away, any surprise model iteration or benchmark-optimized training run could shift outcomes, while current trajectories suggest incremental rather than breakthrough gains remain most likely.
基于Polymarket数据的AI实验性摘要。这不是交易建议,也不影响该市场的结算方式。 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题