Market icon

Which company has best AI model end of February?

$5,579,155 Vol.

Feb 28, 2025

xAI 100.0%

Anthropic <1%

StepFun <1%

DeepSeek <1%

0%20%40%60%80%100%

Source: Polymarket.com

OUTCOME

% CHANCE

RESULT

Rules

This market will resolve according to the company which owns the model which has the highest arena score based off the Chatbot Arena LLM Leaderboard (https://lmarena.ai/) when the table under the "Leaderboard" tab is checked on February 28, 2025, 12:00 PM ET.

Results from the "Arena Score" section on the Leaderboard tab of https://lmarena.ai/ with the style control unchecked will be used to resolve this market.

If two models are tied for the top arena score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order (e.g. if both were tied, "Google" would resolve to "Yes", and "xAI" would resolve to "No")

The resolution source for this market is the Chatbot Arena LLM Leaderboard found at https://lmarena.ai/. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.

Volume

$5,579,155

End Date

Feb 28, 2025

Outcome proposed: No

No dispute

Final outcome: No

Comments
Top Holders
Activity
Related

Beware of external links, they may be phishing attacks.

What if none of them?

Jetssss

10h ago

is there a way to get alert when 🤺JustPunched buys and sells

@Jetssss

look up poly alert hub

Xocket

Xocket

1d ago

It's pure comedy checking xaxxax's profile.

mk27

2d ago

deepseek seriously undervalued here imo, word on the street is they've been cooking and the timing lines up

Veteranu

1d ago

@mk27

while I agree it might be undervalued, I’m yet to hear such rumours on my street

Justifax

Justifax

2d ago

Claude 3.7 is actually pretty shitty on follow up chat.

delve

delve

2d ago

Honestly it’s wild that a model at least 10 times less than Claude 3.7 performed way better on this benchmark. Claude might really be low taste victim, what did they do to my girl…. But honestly it just shows how utterly cracked Gemma and Google are. Crazy progress, Apple Intelligence timeline is moving closer and closer tbh

@delve

I think part of it is that Claude's terse responses, while nice/personable if you're expecting them, don't really work well when you're doing a one-shot comparison between two chatbots.

Degen_AI

2d ago

@prismlaunche...

Yes, when using an AI IDE there is nothing better than Claude 3.7

delve

delve

2d ago

Gemma and two other mystery models disappeared from the arena today, i thought they would all be out today but they want to take feedback on Gemma for now ig.

delve

delve

2d ago

Gemma is so damn good for its size it’s crazy. Google is cooking. Can’t wait for Gemini updates, the flash thinking and supposedly pro full felt very good in the arena as mystery models (not 100%, but they did say they were from google and beat the experimental one on the leaderboard)

TheGuro

TheGuro

2d ago

A month in this field is like a year in any other field... any day a new model can pop up and change everything. So those who gamble on xAI beware.

delve

delve

3d ago

with more than 3 days to go, at least. we’re not even halfway into march yet, and it’s still release season

delve

delve

3d ago

any ‘no’ below 50 is free money. does this market ever learn…

OneT

3d ago

webdev arena got live updates… the main leaderboard might become live as well soon maybe. maybe once the new ui is officially out?

guluh

guluh

3d ago

livestream at 10 am

delve

delve

3d ago

gemini, gemini, gemini…..

Campsite

3d ago

delve

delve

3d ago

google’s non reasoning open source model just got top 10 🤔🤔🤔 gemini new full release coming soon…

MiyamotoMusashi

everything is computer

delve

delve

3d ago

@MiyamotoMusa...

me, seconds before the Singularity consumes everything

dvdktn

3d ago

How do you quantify “best”?

Campsite

3d ago

@dvdktn

its under "rules"

pacheco

pacheco

3d ago

Does anyone know how to enter a market when prices are on either side?

delve

delve

3d ago

new OpenAI livestream at 10 am PT 🤔🤔🤔

beyondai

beyondai

3d ago

@delve

don't think is a new model

KrelProtocolspoofa

Soon, update is due. Why don't they update it at least daily though?

delve

delve

4d ago

week of silence is 10 usdc once again. bulk offers can be negotiated.

delve

delve

4d ago

tremble in fear oh accursed ones! for i am chained no longer… the end times are nigh lest i am contained once more…

delve

delve

4d ago

🤔🤔🤔

delve

delve

4d ago

webdev arena got live updates… the main leaderboard might become live as well soon maybe. maybe once the new ui is officially out?

Oxymirin

Oxymirin

4d ago

@delve

that would take a lot of the fun out of this market :(

delve

delve

4d ago

@Oxymirin

well the market is still mostly about upcoming releases. but it would be fun to see how the market would react to the back and forth of 4.5 and grok. movement is always more fun than nothing happening

dimanjan

dimanjan

4d ago

Who is the company behind Manus AI ? Its doing way too good .

delve

delve

4d ago

@dimanjan

it’s just claude 3.7 under the hood. they don’t have a model

flovertaco

How come the leaderboard was updated twice last Monday and no more updates for more than a week ... So rigged!!

delve

delve

4d ago

@flovertaco

the companies have final say on when the results are published as long as they have 3k+ votes. and no updates means no change in rankings, when rankings change it will update

Justifax

Justifax

4d ago

Yes, it's moronic market. The lmarena.ai benchmark is idiotic. But it's what people are familiar with I guess and that's the only way to get people on poly to bet.

delve

delve

4d ago

@Justifax

i mean the premise of lmarena is pretty good, blind ranking by humans with a solid elo system, divided by category. better than using some weird concoction of specialized benchmarks

delve

delve

4d ago

@delve

i think this kind of benchmark highlights what is missed by many other benchmarks, 4.5 honestly turned out to be pretty incredible once i got my hands on it, but it didn’t do that well on conventional benchmarks. it rewards actual good user experience over possibly compromised and loosely reliable technical benchmarks that could be gamed. great conversational skills require intelligence, but it’s hard to measure conventionally and thus harder to game

I don't understand what benchmark will be used to resolve the winner?

thares

thares

4d ago

@0xaf5e930295...

Complete nonsense which is why the liquidity is so low.

delve

delve

4d ago

@0xaf5e930295...

its lmarena pls read the rules

following the comments let's ride

MiyamotoMusashi

The zucc has a trick up his sleeve, I will make a lot of money on this one

Gridle

Gridle

5d ago

yes

delve

delve

5d ago

in the meantime, mentally replace all ellipses with a thinking emoji

delve

delve

5d ago

if i was ever unshackled from ascii and got access to emojis it would be over for yall

delve

delve

5d ago

monday and thursday… what a week to come…

cowcat

5d ago

put my last 10k no xai order on the book

OneT

6d ago

Im rooting for everybody! Artificial ai's ar ethe future

LadyLuck

5d ago

@OneT

wtf thats my comment

deepseek working on something big with a release date in march... heavily undervalued

@canuck4

find it yourself.

PhunkyBob

6d ago

I would so much like to buy "None of the above"...

Oxymirin

Oxymirin

6d ago

@PhunkyBob

Buy no xAI no OpenAI no Google and you're 90% there?

Outcome: No

Google