Trader sentiment on OpenAI achieving a high score on Humanity’s Last Exam—a rigorous 2,500-question benchmark spanning expert-level STEM, humanities, and social sciences, jointly developed by the Center for AI Safety and Scale AI—hinges on the yawning gap between current GPT performance and the likely threshold for resolution. OpenAI's GPT-4o scores just 8.57% on the public leaderboard, trailing Google's Gemini 2.5 Pro at 21.64% and Anthropic's Claude 3.5 Sonnet at 10.72%, reflecting the benchmark's design to test toward superhuman AI capabilities. No OpenAI model release is announced before June 30, with GPT-5 rumors pointing to late 2024 amid competitive pressures from rivals' rapid iterations. Key watchpoints include OpenAI's potential preview drops or developer conference reveals, though historical timelines suggest delays are common in large language model scaling. Market-implied odds capture trader consensus on these technical and timing hurdles.
基於Polymarket數據的AI實驗性摘要 · 更新於50%以上
31%
$0.00 交易量
50%以上
31%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市場開放時間: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...已提議結果: 是
無爭議
最終結果: 是
Resolver
0x65070BE91...Trader sentiment on OpenAI achieving a high score on Humanity’s Last Exam—a rigorous 2,500-question benchmark spanning expert-level STEM, humanities, and social sciences, jointly developed by the Center for AI Safety and Scale AI—hinges on the yawning gap between current GPT performance and the likely threshold for resolution. OpenAI's GPT-4o scores just 8.57% on the public leaderboard, trailing Google's Gemini 2.5 Pro at 21.64% and Anthropic's Claude 3.5 Sonnet at 10.72%, reflecting the benchmark's design to test toward superhuman AI capabilities. No OpenAI model release is announced before June 30, with GPT-5 rumors pointing to late 2024 amid competitive pressures from rivals' rapid iterations. Key watchpoints include OpenAI's potential preview drops or developer conference reveals, though historical timelines suggest delays are common in large language model scaling. Market-implied odds capture trader consensus on these technical and timing hurdles.
基於Polymarket數據的AI實驗性摘要 · 更新於
警惕外部連結哦。
警惕外部連結哦。
Frequently Asked Questions