Trader consensus on Polymarket reflects skepticism toward Google Gemini achieving a breakthrough score on Humanity’s Last Exam—a rigorous benchmark of 2,500+ expert-level questions spanning math, science, and humanities, curated by the Center for AI Safety to gauge frontier AI limits—by June 30, 2025. Gemini 2.0 Flash Experimental, unveiled December 2024 at Google DeepMind's demo, hit just 7.96% on the public leaderboard, trailing OpenAI’s o1-preview (8.58%) and Anthropic’s Claude 3.5 Sonnet (9.12%), underscoring persistent gaps in reasoning and novel problem-solving despite scaling compute. No model exceeds 10%, highlighting benchmark difficulty amid hallucinations and context limits. Key catalysts include Google I/O in May for potential Gemini 2.5 reveals and mid-year scaling runs, though historical delays like Gemini 1.5 Pro’s postponements signal execution risks in the heated AI race.
Экспериментальная сводка, созданная ИИ на основе данных Polymarket · ОбновленоОценка Google Gemini на последнем экзамене человечества к 30 июня?
Оценка Google Gemini на последнем экзамене человечества к 30 июня?
$201,858 Объем
40%+
98%
45%+
82%
50%+
41%
55%+
16%
60%+
10%
$201,858 Объем
40%+
98%
45%+
82%
50%+
41%
55%+
16%
60%+
10%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Открытие рынка: Jan 29, 2026, 12:50 PM ET
Resolver
0x65070BE91...Resolver
0x65070BE91...Trader consensus on Polymarket reflects skepticism toward Google Gemini achieving a breakthrough score on Humanity’s Last Exam—a rigorous benchmark of 2,500+ expert-level questions spanning math, science, and humanities, curated by the Center for AI Safety to gauge frontier AI limits—by June 30, 2025. Gemini 2.0 Flash Experimental, unveiled December 2024 at Google DeepMind's demo, hit just 7.96% on the public leaderboard, trailing OpenAI’s o1-preview (8.58%) and Anthropic’s Claude 3.5 Sonnet (9.12%), underscoring persistent gaps in reasoning and novel problem-solving despite scaling compute. No model exceeds 10%, highlighting benchmark difficulty amid hallucinations and context limits. Key catalysts include Google I/O in May for potential Gemini 2.5 reveals and mid-year scaling runs, though historical delays like Gemini 1.5 Pro’s postponements signal execution risks in the heated AI race.
Экспериментальная сводка, созданная ИИ на основе данных Polymarket · Обновлено
Не доверяй внешним ссылкам.
Не доверяй внешним ссылкам.
Часто задаваемые вопросы