Story Detail of id 48464100 | Liveview Hacker News

modeless8 hours ago | on: Claude Fable 5

Claude Fable 5 beats Pokémon FireRed using only vision: https://www.youtube.com/watch?v=CIQBP1w4B1M

hi, pokemon red expert here: that video has since been taken private. there is a new what i would assume to be version of that video posted here https://www.youtube.com/watch?v=Ty_50J84fMY and heavily redacted with most of the game actually omitted. very possibly this is just another case of anthropic protecting us from their models' immense power

uludag8 hours ago | parent | next

Any suggestion on how I should calibrate my cynicism towards this?

I can immagine Anthropic running this experiment multiple times and picking the most impressive one. Or I could immagine like this entire run costing like $1000+ of tokens for this particular run. Or maybe they tried a bunch of Pokemon games and it couldn't even finish some of them. Or is it just able to do this because it has an immense amount of FireRed training data, and if you were to give it an "original" Pokemon game, where it actually had to navigate novel circumstances it would fail.

modeless7 hours ago | root | parent

Every model has encyclopedic knowledge of Pokémon FireRed, of course. Knowledge is not ability. This is the first model with the ability to apply that knowledge to beat the game without assistance.

I highly doubt they focused on FireRed specifically in pretraining or posttraining. But we'll see when the ARC-AGI-3 results come out. That will measure its performance on unseen games. Based on this I expect the ARC-AGI-3 score to be SOTA.

milkkarten7 hours ago | parent | next

no reasoning shown. no explanation on any training information. Using vision-only should be an easier version of the task (given training).

there are many standardized evals to do this correctly and Anthropic ignored them to provide a 18 second sped up video of a 50 hour run?

yeah I don't trust this until they provide a live run by a 3rd party with full reasoning traces in real-time. The reason we all liked the Gemini Plays Pokemon style runs were because they were live and couldn't be faked

svcphr8 hours ago | parent | next

Bold move putting in the lvl 3 Pidgey against Gary's Blastoise at the end there (~14sec in... integer timestamps insufficient here).

charcircuit3 hours ago | parent | next

The video is privated now, but the timelapse is weird. Sometimes it skips only seconds before the next screenshot and sometimes it skips probably hours forward.

suddenlybananas8 hours ago | parent | next

Is there any more detail about this besides the very fast slideshow?

modeless8 hours ago | root | parent

Seems like the harness was minimal with no extra game state or maps available. Apparently just the screen image. Seems like it took 50 hours in game time which according to Google is at the high end of a normal human playthrough. No idea how long it took in real time though.

hmokiguess5 hours ago | parent | next

"Computer system goes through a finite state machine"

ex-aws-dude8 hours ago | parent | next

I mean that’s AGI confirmed right?

ml-doom2 hours ago | parent

[dead]

#visit	13,690,069
#session	74,665
#live-session	0