OpenAI's latest reasoning-focused model, o1-preview, has demonstrated notable gains on challenging benchmarks but trails far behind the superhuman thresholds implied by Humanity's Last Exam (HLE), a rigorous 3,000-question test across math, science, and humanities launched by the Center for AI Safety in November 2024. Current leaderboard scores show o1 at around 7%, with competitors like Google's Gemini 2.0 Experimental leading at 22%, underscoring the rapid competitive dynamics in artificial intelligence capabilities. No official OpenAI announcements target HLE specifically, but trader sentiment hinges on anticipated GPT-5 (or "Orion") release in early 2025, which could incorporate advanced chain-of-thought reasoning to close the gap. Key watchpoints include December developer previews and Q1 2025 model launches amid scaling compute constraints and safety evaluations.
Experimentelle KI-generierte Zusammenfassung mit Polymarket-Daten · Aktualisiert50 %+
31%
$0.00 Vol.
50 %+
31%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Markt eröffnet: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...OpenAI's latest reasoning-focused model, o1-preview, has demonstrated notable gains on challenging benchmarks but trails far behind the superhuman thresholds implied by Humanity's Last Exam (HLE), a rigorous 3,000-question test across math, science, and humanities launched by the Center for AI Safety in November 2024. Current leaderboard scores show o1 at around 7%, with competitors like Google's Gemini 2.0 Experimental leading at 22%, underscoring the rapid competitive dynamics in artificial intelligence capabilities. No official OpenAI announcements target HLE specifically, but trader sentiment hinges on anticipated GPT-5 (or "Orion") release in early 2025, which could incorporate advanced chain-of-thought reasoning to close the gap. Key watchpoints include December developer previews and Q1 2025 model launches amid scaling compute constraints and safety evaluations.
Experimentelle KI-generierte Zusammenfassung mit Polymarket-Daten · Aktualisiert
Vorsicht bei externen Links.
Vorsicht bei externen Links.
Häufig gestellte Fragen