OpenAI's latest models, including o1-preview and GPT-4o, have achieved only low single-digit scores on Humanity’s Last Exam—a rigorous 2,500-question benchmark from the Center for AI Safety and Scale AI designed to test frontier artificial intelligence capabilities across expert domains—with o1 at around 8% and Claude 3.5 Sonnet leading at 9%. Released in September 2024, o1 marked a leap in reasoning benchmarks, yet Humanity’s Last Exam remains a tough hurdle reflecting current large language model limits. Trader sentiment hinges on OpenAI's aggressive roadmap, including full o1 rollout soon and potential GPT-5 or "Orion" by mid-2025, amid intensifying competition from Anthropic and Google. Key catalysts: upcoming model announcements, developer conferences, and benchmark updates before June 30, 2025 resolution, though timelines often slip in AI development.
Experimental AI-generated summary referencing Polymarket data · Updated50%+
31%
$0.00 Vol.
50%+
31%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Market Opened: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...OpenAI's latest models, including o1-preview and GPT-4o, have achieved only low single-digit scores on Humanity’s Last Exam—a rigorous 2,500-question benchmark from the Center for AI Safety and Scale AI designed to test frontier artificial intelligence capabilities across expert domains—with o1 at around 8% and Claude 3.5 Sonnet leading at 9%. Released in September 2024, o1 marked a leap in reasoning benchmarks, yet Humanity’s Last Exam remains a tough hurdle reflecting current large language model limits. Trader sentiment hinges on OpenAI's aggressive roadmap, including full o1 rollout soon and potential GPT-5 or "Orion" by mid-2025, amid intensifying competition from Anthropic and Google. Key catalysts: upcoming model announcements, developer conferences, and benchmark updates before June 30, 2025 resolution, though timelines often slip in AI development.
Experimental AI-generated summary referencing Polymarket data · Updated


Beware of external links.
Beware of external links.
Frequently Asked Questions