Anthropic's Claude Opus 4.6, released February 5, 2026, has propelled trader sentiment by topping the Humanity’s Last Exam leaderboard at 53% accuracy using advanced agentic setups—including web tools, code execution, and up to 3 million tokens of reasoning compute—outpacing rivals like OpenAI's GPT-5 variants and Google's Gemini models. This frontier benchmark of 2,500 expert-level questions tests multidisciplinary AI reasoning at human-expert frontiers, where prior Claude versions scored under 15% in standard evaluations. With three months until resolution, anticipation builds for Claude 5's potential Q2 launch, which could push scores toward 60%+, though decontamination protocols and benchmark updates introduce uncertainty in final leaderboard standings.
基于Polymarket数据的AI实验性摘要 · 更新于$187,516 交易量
35%+
94%
45%以上
45%
$187,516 交易量
35%+
94%
45%以上
45%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市场开放时间: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...已提议结果: 是
无争议
最终结果: 是
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...已提议结果: 是
无争议
最终结果: 是
Anthropic's Claude Opus 4.6, released February 5, 2026, has propelled trader sentiment by topping the Humanity’s Last Exam leaderboard at 53% accuracy using advanced agentic setups—including web tools, code execution, and up to 3 million tokens of reasoning compute—outpacing rivals like OpenAI's GPT-5 variants and Google's Gemini models. This frontier benchmark of 2,500 expert-level questions tests multidisciplinary AI reasoning at human-expert frontiers, where prior Claude versions scored under 15% in standard evaluations. With three months until resolution, anticipation builds for Claude 5's potential Q2 launch, which could push scores toward 60%+, though decontamination protocols and benchmark updates introduce uncertainty in final leaderboard standings.
基于Polymarket数据的AI实验性摘要 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题