Anthropic's Claude 3.5 Sonnet, released on June 20, currently scores just 1.7% on the FrontierMath benchmark—a demanding set of 179 advanced math problems launched June 6 by Epoch AI and partners, where no frontier model exceeds 5% accuracy. This low performance reflects the benchmark's design to test limits beyond standard large language model capabilities, with competitors like OpenAI's o1-preview at similar 2% levels. Trader sentiment hinges on whether Anthropic releases an update or new evaluation before June 30, though historical patterns show math benchmark gains require significant architectural advances rather than quick patches. Watch for developer conference announcements or internal progress leaks that could signal math-focused improvements.
Polymarketデータを参照したAI生成の実験的な要約 · 更新日$47,034 Vol.
50%以上
67%
$47,034 Vol.
50%以上
67%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
マーケット開始日: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...Resolver
0x65070BE91...Anthropic's Claude 3.5 Sonnet, released on June 20, currently scores just 1.7% on the FrontierMath benchmark—a demanding set of 179 advanced math problems launched June 6 by Epoch AI and partners, where no frontier model exceeds 5% accuracy. This low performance reflects the benchmark's design to test limits beyond standard large language model capabilities, with competitors like OpenAI's o1-preview at similar 2% levels. Trader sentiment hinges on whether Anthropic releases an update or new evaluation before June 30, though historical patterns show math benchmark gains require significant architectural advances rather than quick patches. Watch for developer conference announcements or internal progress leaks that could signal math-focused improvements.
Polymarketデータを参照したAI生成の実験的な要約 · 更新日
外部リンクに注意してください。
外部リンクに注意してください。
よくある質問