The FrontierMath benchmark, launched June 3 by Scale AI, challenges frontier large language models with 500 novel math problems exceeding International Math Olympiad difficulty, where even OpenAI's o1-preview scores just 9.3% and Gemini 2.5 Pro lags at 2.9%. xAI's Grok-1.5, released in April, dominates standard MATH benchmarks at 50.6% but lacks a public FrontierMath evaluation, fueling trader uncertainty ahead of the June 30 deadline. Elon Musk recently teased Grok-2's imminent release with enhanced reasoning, positioning xAI against rivals like OpenAI and Anthropic amid intensifying AI capability races. Key watch: xAI eval announcements or model drops before resolution, as timelines often slip in this fast-evolving field.
基于Polymarket数据的AI实验性摘要 · 更新于25%+
75%
30%+
73%
40%以上
60%
50%以上
22%
$3,171 交易量
25%+
75%
30%+
73%
40%以上
60%
50%以上
22%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
市场开放时间: Jan 30, 2026, 12:01 AM ET
Resolver
0x65070BE91...Resolver
0x65070BE91...The FrontierMath benchmark, launched June 3 by Scale AI, challenges frontier large language models with 500 novel math problems exceeding International Math Olympiad difficulty, where even OpenAI's o1-preview scores just 9.3% and Gemini 2.5 Pro lags at 2.9%. xAI's Grok-1.5, released in April, dominates standard MATH benchmarks at 50.6% but lacks a public FrontierMath evaluation, fueling trader uncertainty ahead of the June 30 deadline. Elon Musk recently teased Grok-2's imminent release with enhanced reasoning, positioning xAI against rivals like OpenAI and Anthropic amid intensifying AI capability races. Key watch: xAI eval announcements or model drops before resolution, as timelines often slip in this fast-evolving field.
基于Polymarket数据的AI实验性摘要 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题