Recent advances in agentic coding have driven strong trader interest in whether leading large language models will hit elevated scores on benchmarks like SWE-bench Verified or Coding Arena by June 30. Anthropic’s Claude Mythos Preview currently tops verified leaderboards near 94 percent through refined tool use and multi-step reasoning, while OpenAI’s GPT-5.3 Codex and Google’s Gemini 3 series sit in the mid-80s to low-90s range on comparable tasks. Competitive pressure from frequent updates, including high-reasoning modes and specialized coding agents released earlier this year, continues to push incremental gains. Traders watch for any new model drops or benchmark refreshes in the coming weeks, as historical patterns show rapid iteration can close remaining gaps before the deadline, though saturation effects on established evals introduce uncertainty.
基于Polymarket数据的AI实验性摘要。这不是交易建议,也不影响该市场的结算方式。 · 更新于1550
54%
1560
62%
1570
11%
$7,817 交易量
1550
54%
1560
62%
1570
11%
Results from the "Score" column under the "Text Arena | Coding" Leaderboard tab at https://arena.ai/leaderboard/text/coding-no-style-control with style control off will be used to resolve this market.
The resolution source for this market is the Chatbot Arena LLM Leaderboard found at arena.ai/leaderboard/text. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and will resolve based on the first check after it becomes available. If permanently unavailable, this market will resolve to "No".
市场开放时间: Apr 2, 2026, 6:09 PM ET
Resolver
0x65070BE91...Results from the "Score" column under the "Text Arena | Coding" Leaderboard tab at https://arena.ai/leaderboard/text/coding-no-style-control with style control off will be used to resolve this market.
The resolution source for this market is the Chatbot Arena LLM Leaderboard found at arena.ai/leaderboard/text. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and will resolve based on the first check after it becomes available. If permanently unavailable, this market will resolve to "No".
Resolver
0x65070BE91...Recent advances in agentic coding have driven strong trader interest in whether leading large language models will hit elevated scores on benchmarks like SWE-bench Verified or Coding Arena by June 30. Anthropic’s Claude Mythos Preview currently tops verified leaderboards near 94 percent through refined tool use and multi-step reasoning, while OpenAI’s GPT-5.3 Codex and Google’s Gemini 3 series sit in the mid-80s to low-90s range on comparable tasks. Competitive pressure from frequent updates, including high-reasoning modes and specialized coding agents released earlier this year, continues to push incremental gains. Traders watch for any new model drops or benchmark refreshes in the coming weeks, as historical patterns show rapid iteration can close remaining gaps before the deadline, though saturation effects on established evals introduce uncertainty.
基于Polymarket数据的AI实验性摘要。这不是交易建议,也不影响该市场的结算方式。 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题