Google's Gemini 3.1 Pro Preview, released in February 2026, currently leads the Humanity's Last Exam leaderboard with 46.4% accuracy on the 2,500-question benchmark of expert-level problems across math, science, and humanities. This marks a clear advance over Gemini 3 Pro's 37.5% score from late 2025 and edges ahead of OpenAI's GPT-5.4 Pro at 44.3%, reflecting stronger reasoning chains and reduced hallucinations in frontier tasks. Recent May 2026 API enhancements, including Gemini 3.1 Flash-Lite rollout, underscore ongoing optimization of thinking modes that traders see as likely to push scores higher before the June 30 deadline. While benchmark saturation and evaluation variance introduce some uncertainty for thresholds above 50%, the pace of verified releases positions Google to maintain or widen its edge through targeted model updates.
基於Polymarket數據的AI實驗性摘要。這不是交易建議,也不影響該市場的結算方式。 · 更新於$312,088 交易量
50%+
70%
55% 以上
27%
60%+
6%
$312,088 交易量
50%+
70%
55% 以上
27%
60%+
6%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市場開放時間: Jan 29, 2026, 12:50 PM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Google's Gemini 3.1 Pro Preview, released in February 2026, currently leads the Humanity's Last Exam leaderboard with 46.4% accuracy on the 2,500-question benchmark of expert-level problems across math, science, and humanities. This marks a clear advance over Gemini 3 Pro's 37.5% score from late 2025 and edges ahead of OpenAI's GPT-5.4 Pro at 44.3%, reflecting stronger reasoning chains and reduced hallucinations in frontier tasks. Recent May 2026 API enhancements, including Gemini 3.1 Flash-Lite rollout, underscore ongoing optimization of thinking modes that traders see as likely to push scores higher before the June 30 deadline. While benchmark saturation and evaluation variance introduce some uncertainty for thresholds above 50%, the pace of verified releases positions Google to maintain or widen its edge through targeted model updates.
基於Polymarket數據的AI實驗性摘要。這不是交易建議,也不影響該市場的結算方式。 · 更新於
警惕外部連結哦。
警惕外部連結哦。
Frequently Asked Questions