Anthropic's Claude 3.5 Sonnet, released June 20, leads trader consensus with low single-digit scores on Humanity’s Last Exam—a rigorous benchmark of 1,000+ expert-level questions across math, science, and humanities designed by the Center for AI Safety to probe frontier AI capabilities toward AGI. Current public evaluations show Claude 3.5 Sonnet at around 4-5%, trailing OpenAI's o1-preview (8.2%), reflecting the exam's extreme difficulty where no large language model exceeds 10%. With the June 30 deadline days away, no new Claude iteration has been announced, dampening odds for a breakthrough; traders eye potential last-minute evaluations or unreported internal progress, though historical benchmark timelines suggest limited upside absent a surprise model drop. Competitive pressure from OpenAI and Google intensifies focus on reasoning advancements.
基于Polymarket数据的AI实验性摘要 · 更新于$187,336 交易量
35%+
94%
45%以上
49%
$187,336 交易量
35%+
94%
45%以上
49%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市场开放时间: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Anthropic's Claude 3.5 Sonnet, released June 20, leads trader consensus with low single-digit scores on Humanity’s Last Exam—a rigorous benchmark of 1,000+ expert-level questions across math, science, and humanities designed by the Center for AI Safety to probe frontier AI capabilities toward AGI. Current public evaluations show Claude 3.5 Sonnet at around 4-5%, trailing OpenAI's o1-preview (8.2%), reflecting the exam's extreme difficulty where no large language model exceeds 10%. With the June 30 deadline days away, no new Claude iteration has been announced, dampening odds for a breakthrough; traders eye potential last-minute evaluations or unreported internal progress, though historical benchmark timelines suggest limited upside absent a surprise model drop. Competitive pressure from OpenAI and Google intensifies focus on reasoning advancements.
基于Polymarket数据的AI实验性摘要 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题