Trader skepticism dominates Polymarket odds for Anthropic's Claude surpassing meaningful thresholds on the FrontierMath benchmark by June 30, 2025, driven by the model's current dismal 1.9% score on this ultra-hard test of IMO-level math reasoning, released by Scale AI in November 2024. Claude 3.5 Sonnet lags behind OpenAI's o1-preview (10.6%) and Gemini 2.0 Flash (similarly low), highlighting competitive gaps in frontier reasoning amid Anthropic's deliberate safety-focused cadence. Key catalysts include potential Claude 4 training updates—rumored for early 2025 but unconfirmed—and developer conferences like possible Anthropic events, though benchmark creators note problems are unsolved even by top humans, tempering expectations for rapid leaps.
Экспериментальная сводка, созданная ИИ на основе данных Polymarket · ОбновленоАнтропный балл Клода по FrontierMath Benchmark к 30 июня?
Антропный балл Клода по FrontierMath Benchmark к 30 июня?
$47,034 Объем
50%+
54%
$47,034 Объем
50%+
54%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Открытие рынка: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...Предложенный исход: Да
Спор отсутствует
Окончательный исход: Да
Resolver
0x65070BE91...Trader skepticism dominates Polymarket odds for Anthropic's Claude surpassing meaningful thresholds on the FrontierMath benchmark by June 30, 2025, driven by the model's current dismal 1.9% score on this ultra-hard test of IMO-level math reasoning, released by Scale AI in November 2024. Claude 3.5 Sonnet lags behind OpenAI's o1-preview (10.6%) and Gemini 2.0 Flash (similarly low), highlighting competitive gaps in frontier reasoning amid Anthropic's deliberate safety-focused cadence. Key catalysts include potential Claude 4 training updates—rumored for early 2025 but unconfirmed—and developer conferences like possible Anthropic events, though benchmark creators note problems are unsolved even by top humans, tempering expectations for rapid leaps.
Экспериментальная сводка, созданная ИИ на основе данных Polymarket · Обновлено
Не доверяй внешним ссылкам.
Не доверяй внешним ссылкам.
Часто задаваемые вопросы