OpenAI's latest models, including o1-preview and GPT-4o, have achieved only low single-digit scores on Humanity’s Last Exam—a rigorous 2,500-question benchmark from the Center for AI Safety and Scale AI designed to test frontier artificial intelligence capabilities across expert domains—with o1 at around 8% and Claude 3.5 Sonnet leading at 9%. Released in September 2024, o1 marked a leap in reasoning benchmarks, yet Humanity’s Last Exam remains a tough hurdle reflecting current large language model limits. Trader sentiment hinges on OpenAI's aggressive roadmap, including full o1 rollout soon and potential GPT-5 or "Orion" by mid-2025, amid intensifying competition from Anthropic and Google. Key catalysts: upcoming model announcements, developer conferences, and benchmark updates before June 30, 2025 resolution, though timelines often slip in AI development.
Polymarketデータを参照したAI生成の実験的な要約 · 更新日50%以上
31%
$0.00 Vol.
50%以上
31%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
マーケット開始日: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...OpenAI's latest models, including o1-preview and GPT-4o, have achieved only low single-digit scores on Humanity’s Last Exam—a rigorous 2,500-question benchmark from the Center for AI Safety and Scale AI designed to test frontier artificial intelligence capabilities across expert domains—with o1 at around 8% and Claude 3.5 Sonnet leading at 9%. Released in September 2024, o1 marked a leap in reasoning benchmarks, yet Humanity’s Last Exam remains a tough hurdle reflecting current large language model limits. Trader sentiment hinges on OpenAI's aggressive roadmap, including full o1 rollout soon and potential GPT-5 or "Orion" by mid-2025, amid intensifying competition from Anthropic and Google. Key catalysts: upcoming model announcements, developer conferences, and benchmark updates before June 30, 2025 resolution, though timelines often slip in AI development.
Polymarketデータを参照したAI生成の実験的な要約 · 更新日
外部リンクに注意してください。
外部リンクに注意してください。
よくある質問