Google DeepMind's Gemini models have not yet publicly reported scores on Humanity’s Last Exam, a rigorous crowdsourced benchmark of over 1,000 expert-level questions across math, science, and humanities launched by the Center for AI Safety in May 2024 to probe frontier AI capabilities toward AGI. Current leaderboards show top models like Anthropic’s Claude 3.5 Sonnet and OpenAI’s o1-preview scoring below 20%, underscoring the exam’s difficulty amid a competitive race where reasoning benchmarks increasingly define AI positioning. Gemini 1.5 Pro excels in multimodal tasks and long-context processing per recent Google I/O demos, but lacks demonstrated strength on pure reasoning tests like GPQA or MATH that correlate with Last Exam performance. With under two weeks to June 30, traders watch for potential Gemini 2.0 previews or benchmark disclosures at upcoming developer events, though historical release timelines suggest delays are common.
Experimental AI-generated summary referencing Polymarket data · Updated$132,609 Vol.
40%+
98%
45%+
82%
50%+
40%
55%+
17%
60%+
10%
$132,609 Vol.
40%+
98%
45%+
82%
50%+
40%
55%+
17%
60%+
10%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Market Opened: Jan 29, 2026, 12:50 PM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Google DeepMind's Gemini models have not yet publicly reported scores on Humanity’s Last Exam, a rigorous crowdsourced benchmark of over 1,000 expert-level questions across math, science, and humanities launched by the Center for AI Safety in May 2024 to probe frontier AI capabilities toward AGI. Current leaderboards show top models like Anthropic’s Claude 3.5 Sonnet and OpenAI’s o1-preview scoring below 20%, underscoring the exam’s difficulty amid a competitive race where reasoning benchmarks increasingly define AI positioning. Gemini 1.5 Pro excels in multimodal tasks and long-context processing per recent Google I/O demos, but lacks demonstrated strength on pure reasoning tests like GPQA or MATH that correlate with Last Exam performance. With under two weeks to June 30, traders watch for potential Gemini 2.0 previews or benchmark disclosures at upcoming developer events, though historical release timelines suggest delays are common.
Experimental AI-generated summary referencing Polymarket data · Updated
Beware of external links.
Beware of external links.
Frequently Asked Questions