Anthropic's Claude Opus 4.6, released February 5, 2026, achieved 34.4% accuracy on Humanity's Last Exam—a frontier benchmark of 2,500 expert-level questions across math, sciences, and humanities—using max thinking mode, placing it second behind OpenAI's GPT-5.4 at 44.3% on the Scale Labs leaderboard. This marks a significant leap from prior Claude scores around 10-14%, driven by enhanced chain-of-thought reasoning and agentic capabilities, though calibration errors reveal persistent overconfidence. No new model releases or benchmark updates have emerged in the past 30 days, with Anthropic's March focus on user preference studies. Traders eye Claude 5 or iterative upgrades before June 30, amid intensifying rivalry from Google Gemini and OpenAI, where safety constraints may temper Anthropic's scaling pace.
Experimentelle KI-generierte Zusammenfassung mit Polymarket-Daten · Aktualisiert$187,560 Vol.
35 %+
94%
45 %+
40%
$187,560 Vol.
35 %+
94%
45 %+
40%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Markt eröffnet: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Anthropic's Claude Opus 4.6, released February 5, 2026, achieved 34.4% accuracy on Humanity's Last Exam—a frontier benchmark of 2,500 expert-level questions across math, sciences, and humanities—using max thinking mode, placing it second behind OpenAI's GPT-5.4 at 44.3% on the Scale Labs leaderboard. This marks a significant leap from prior Claude scores around 10-14%, driven by enhanced chain-of-thought reasoning and agentic capabilities, though calibration errors reveal persistent overconfidence. No new model releases or benchmark updates have emerged in the past 30 days, with Anthropic's March focus on user preference studies. Traders eye Claude 5 or iterative upgrades before June 30, amid intensifying rivalry from Google Gemini and OpenAI, where safety constraints may temper Anthropic's scaling pace.
Experimentelle KI-generierte Zusammenfassung mit Polymarket-Daten · Aktualisiert
Vorsicht bei externen Links.
Vorsicht bei externen Links.
Häufig gestellte Fragen