Google's Gemini 3 Deep Think mode achieved a record 48.4% on Humanity's Last Exam—a frontier benchmark of 2,500 expert-level questions across disciplines—in February 2026, without tools, surpassing prior leaders like GPT-5 Pro at 31-37%. Subsequent Gemini 3.1 Pro Preview scores of 44.7-45.9% on independent leaderboards (Artificial Analysis, Wikipedia) maintain its edge amid tight competition from OpenAI's GPT-5.4 (41.6%) and Anthropic's Claude Opus 4.6 (34.4%), highlighting rapid scaling in reasoning capabilities. Traders eye Google I/O in May for potential Gemini 4 previews or upgrades, with three months until June 30 resolution; historical patterns suggest iterative improvements could push scores higher, though calibration errors reveal persistent overconfidence gaps.
Экспериментальная сводка, созданная ИИ на основе данных Polymarket · ОбновленоОценка Google Gemini на последнем экзамене человечества к 30 июня?
Оценка Google Gemini на последнем экзамене человечества к 30 июня?
$132,621 Объем
40%+
98%
45%+
83%
50%+
40%
55%+
16%
60%+
9%
$132,621 Объем
40%+
98%
45%+
83%
50%+
40%
55%+
16%
60%+
9%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Открытие рынка: Jan 29, 2026, 12:50 PM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Google's Gemini 3 Deep Think mode achieved a record 48.4% on Humanity's Last Exam—a frontier benchmark of 2,500 expert-level questions across disciplines—in February 2026, without tools, surpassing prior leaders like GPT-5 Pro at 31-37%. Subsequent Gemini 3.1 Pro Preview scores of 44.7-45.9% on independent leaderboards (Artificial Analysis, Wikipedia) maintain its edge amid tight competition from OpenAI's GPT-5.4 (41.6%) and Anthropic's Claude Opus 4.6 (34.4%), highlighting rapid scaling in reasoning capabilities. Traders eye Google I/O in May for potential Gemini 4 previews or upgrades, with three months until June 30 resolution; historical patterns suggest iterative improvements could push scores higher, though calibration errors reveal persistent overconfidence gaps.
Экспериментальная сводка, созданная ИИ на основе данных Polymarket · Обновлено
Не доверяй внешним ссылкам.
Не доверяй внешним ссылкам.
Часто задаваемые вопросы