Recent Anthropic model iterations, including Claude Opus 4.6 and variants with extended thinking and tool use, have delivered frontier scores on Humanity’s Last Exam (HLE), a 2,500-question multidisciplinary benchmark of expert-level problems, reaching 34-53% accuracy depending on configuration. These results position Claude competitively against OpenAI’s GPT-5 series and Google’s Gemini previews amid rapid iteration cycles. Trader focus centers on whether further internal scaling, prompt optimizations, or a new release before June 30 can push scores past key thresholds like 45%. Historical patterns show frontier large language models improving several percentage points monthly on such benchmarks when new capabilities ship, though exact timelines remain uncertain.
Polymarket डेटा का संदर्भ देने वाला प्रयोगात्मक AI-जनरेटेड सारांश। यह ट्रेडिंग सलाह नहीं है और इस बाज़ार के समाधान में कोई भूमिका नहीं निभाता। · अपडेट किया गया$316,664 वॉल्यूम
45%+
63%
50%+
36%
55%+
11%
$316,664 वॉल्यूम
45%+
63%
50%+
36%
55%+
11%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
बाज़ार खुला: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Recent Anthropic model iterations, including Claude Opus 4.6 and variants with extended thinking and tool use, have delivered frontier scores on Humanity’s Last Exam (HLE), a 2,500-question multidisciplinary benchmark of expert-level problems, reaching 34-53% accuracy depending on configuration. These results position Claude competitively against OpenAI’s GPT-5 series and Google’s Gemini previews amid rapid iteration cycles. Trader focus centers on whether further internal scaling, prompt optimizations, or a new release before June 30 can push scores past key thresholds like 45%. Historical patterns show frontier large language models improving several percentage points monthly on such benchmarks when new capabilities ship, though exact timelines remain uncertain.
Polymarket डेटा का संदर्भ देने वाला प्रयोगात्मक AI-जनरेटेड सारांश। यह ट्रेडिंग सलाह नहीं है और इस बाज़ार के समाधान में कोई भूमिका नहीं निभाता। · अपडेट किया गया
बाहरी लिंक से सावधान रहें।
बाहरी लिंक से सावधान रहें।
अक्सर पूछे जाने वाले प्रश्न