Story Detail of id 48464054 | Liveview Hacker News

Pelican for Fable 5 on default settings is a clear improvement on Opus 4.8

Fable 5 default: https://gist.github.com/simonw/036bee5a703e7ec84e34efa974438...

Opus 4.8 (the "max" one is closest to Fable): https://simonwillison.net/2026/May/28/claude-opus-4-8/#and-s...

Now here are the Fable pelicans for all five of the thinking effort levels - low, medium, high, xhigh, max: https://tools.simonwillison.net/markdown-svg-renderer#url=ht...

Low used 25 input, 1,929 output - 9.67 cents: https://www.llm-prices.com/#it=25&ot=1929&sel=claude-fable-5

Max used 25 input, 14,430 output - 72.175 cents! https://www.llm-prices.com/#it=25&ot=14430&sel=claude-fable-...

sempron648 hours ago | parent | next

The pelican has looked very same-y across all frontier models, same color bike, same camera angle, etc. I suspect this challenge is already too embedded in the training data to be a good signal when it succeeds, and maybe even when it fails in pathological ways mirroring existing AI pelicans on the internet.

tripleee8 hours ago | root | parent | next

I'd say it's working great for its intended purpose. Keeps Simon on top of all these threads and funnels traffic to his site.

yreg7 hours ago | root | parent | next

I really don't understand what's interesting about this test and why is it always on top.

simonw7 hours ago | root | parent | next

It's funny.

girvo2 hours ago | root | parent

It really is lol

depr6 hours ago | root | parent | next

Same reason you would always see the same top comments on reddit during a certain era.

luqtas1 hour ago | root | parent | next

because you can't still ask LLMs to port DOOM to hardware X or Y

WithinReason6 hours ago | root | parent

It's a meme, and HN loves upvoting memes. Just like Reddit!

port116 hours ago | root | parent | next

The ultimate measure of an LLM is whether it can produce a capable image of a pelican riding a bicycle. All other use cases are but a distraction!

scrollaway7 hours ago | root | parent | next

Do you seriously have a dedicated “bad takes on AI” hn account?

tripleee6 hours ago | root | parent

yeah, although I do combine it with "replies to snarky questions" for efficiency

jurgenaut237 hours ago | root | parent

True that

h4ny6 hours ago | root | parent | next

Was it ever a good test? How do you even objectively assess what a good pelican on a bike is anyway?

fwipsy6 hours ago | root | parent

SVG generation is a good test because it's extremely easy to subjectively assess with visual reasoning where humans are strong. However, pelican on a bike specifically may be overused at this point.

kayge5 hours ago | root | parent | next

Do you think the models are ready for the next level? I believe that would be: Pelican feeding Spaghetti to Will Smith.

stratos1233 hours ago | root | parent | next

I'd be very surprised if this is in the training data given that most models mess it up to this day. E.g. look at the ones from Opus.

quantumwoke7 hours ago | root | parent

Variations of this comment have been posted for over a year. The pelican has now morphed into part of HN culture rather than a legitimate benchmark, but it's still valuable as a meme.

brazukadev6 hours ago | root | parent

it is more an example of gaming (the HN system) than meme.

sarreph8 hours ago | parent | next

I'm beginning to wonder how much of a useful metric the pelican is because surely the frontier labs must be training their models on pelican-artistry because of how well known your test is now?

bensyverson8 hours ago | root | parent | next

Simon has addressed this on virtually every new model release. He also has unpublished alternate prompts. But the larger point is: this is a fun experiment, not a serious and objective benchmark.

refulgentis7 hours ago | root | parent

It's silly and a joke and a surprisingly good benchmark and don't take it seriously but don't take not taking it seriously seriously and if it's too good we use another prompt but don't actually because then it's not the pelican post and there's obvious ways to better it and it's not worth doing because it's not serious.

Only coherent move at this point: hit the minus button immediately. There's never anything about the model in the thread other than simon's post.

stasomatic7 hours ago | root | parent

But what if they are better at flamingos? Are they optimized for pelicans? How about “draw me a four headed owl”? The meme, I get it, but I’d settle for a working bash script, tbh.

wongarsu8 hours ago | root | parent | next

I just run my own benchmark for "draw an SVG with $animal driving $vehicle". I won't post my choice of animal and mode of transport, but there are plenty of uncommon combinations to choose from. So far it's a fun and visually intuitive benchmark that does seem to correlate with model capabilities

modriano8 hours ago | root | parent | next

I don't know. Just looking at the bike frames (specifically the fact that the AI generated bikes have rather unsteerable front forks), it's clear to me that frontier labs aren't spending much time tuning models to make bikes look coherent, which I assume is an easier task than making a pelican riding a bike look coherent.

notnullorvoid6 hours ago | root | parent | next

The way I see it the benefit of benchmark isn't to take Simon's results at face value. It's a template for your own benchmarks that are easy to visually evaluate.

HaZeust8 hours ago | root | parent | next

I've seen this reply to Simon's benchmark for 2 years running now, and yet you still see improvements and objectively-bad results over time from new releases, even when I'm sure every frontier AI team has/had a person at least partially dedicated to better bicycle-pelican SVG outputs. Alas.

sarreph8 hours ago | root | parent | next

I had intended to caveat that: I'm sure I'm not the first person to ask about this!

> you still see improvements

This is expected if they are training their models on it, right?

> objectively-bad results

Keen to learn when this has been the case, i.e. across version increments in major models.

simonw8 hours ago | root | parent

I've written about this a couple of times, most notably here: https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

I've been enjoying seeing how the quality of individual models differ based on the amount of reasoning effort you give them. If they were baking an a good pelican you wouldn't expect them to differ so much.

(Google Gemini are the only lab that have very clearly paid attention to the quality of SVG animals-riding-vehicles, see their announcement for Gemini 3.1: https://twitter.com/JeffDean/status/2024525132266688757 )

sarreph8 hours ago | root | parent

Amazing, thank you Simon! Look forward to reading.

384848588 hours ago | root | parent

[flagged]

llm_nerd8 hours ago | root | parent

I honestly assumed their comment was tongue in cheek humour, because positively no one actually cares how these models generate an SVG pelican riding a bicycle. It's some meme thing that this stuff always appears here.

BrokenCogs8 hours ago | root | parent

Yeah this is not a real benchmark, it's just a fun tradition everytime a new model is released

pelipost1238 hours ago | root | parent

"fun" / boringly predictable meme thread with 30+ replies already

brazukadev6 hours ago | root | parent

It is telling that people need to create throwaway accounts to criticize simonw's behavior in this website.

iLoveOncall7 hours ago | root | parent

It was a completely useless test even before the labs trained for it.

loading story #48470023

ealready_value8 hours ago | parent | next

This is the reply I look for in all the new model announcements. Its fun to tell people that I judge models based on pelicans.

pixel_popping8 hours ago | root | parent | next

This is all we need, that moment the Pelican put the leg behind the frame, we are all doomed.

chorkpop8 hours ago | root | parent | next

Now someone post the link about how it’s impossible for humans to draw a bike from memory.

upcoming-sesame6 hours ago | root | parent

I also look for this reply because i like seeing the follow-up reply saying that this is not a benchmark anymore because labs have gotten it in their training data.

that reply never failed to come it's basically a meme at this point

raffael_de6 hours ago | parent | next

I find it quite interesting that while the picture looks better the more advanced the model is, but apparently none so far "understands" that the pelicans legs are on both sides of the bike / top bar.

LordDragonfang5 hours ago | root | parent

If you scroll to the bottom of the Fable-5 by effort page, Max effort actually gets this correct! (Along with being the only one I've seen so far to make a bicycle frame that matches the shape of what most bikes on Google images look like)

wasabi9910115 hours ago | root | parent

And the only one linked here that includes a bicycle chain!

redox998 hours ago | parent | next

It's interesting that they still get the head tube / handle bar part wrong.

aarjaneiro8 hours ago | root | parent

Or the hands not being wings

ethanlipson8 hours ago | parent | next

How much money do you think they spent fine-tuning on pelican SVG generation?

tarruda8 hours ago | root | parent | next

Not as much as Qwen, since apparently 3.6 35B surpassed Opus 4.7 https://x.com/simonw/status/2044830134885306701

csomar8 hours ago | root | parent

Probably none. They probably have much better targets to optimize for than an SVG pelican or even SVGs in general.

smusamashah5 hours ago | parent | next

Can you please compare the code generated by other similar quality pelicans by other models. Code in your first link (Fable 5 Default) looks minimal yet very good.

leecommamichael8 hours ago | parent | next

Looks like Fable constructed the "max" "looking" pelican of the previous model for the "xhigh" output token count of the previous model.

rkuska8 hours ago | parent | next

Is it possible to use the credits from subscription (https://support.claude.com/en/articles/15036540-use-the-clau...) for fable?

ceroxylon3 hours ago | parent | next

Yay, max level actually put one of the legs behind the frame!

382hi8 hours ago | parent | next

I'm pretty sure they're optimizing the models around these sorts of tests.

bergheim6 hours ago | parent | next

Anyone care about these pelicans that always come up anymore?

Clearly at this point they are part of the training data.

They even all look sort of ish the same. Daytime, colors,...

1attice6 hours ago | root | parent

Without being mean, I encourage you to go look at some of simonw's writing on this topic, which he has addressed repeatedly (and IMO satisfactorily.)

I know because I too had this initial take; however, upon analysis, it is not sound.

bergheim6 hours ago | root | parent

I know he is an AI influencer that promotes his blog any chance he gets.

I agree as well that he writes many interesting things.

5 hours ago | root | parent

{"deleted":true,"id":48467476,"parent":48466750,"time":1781037811,"type":"comment"}

makingstuffs8 hours ago | parent | next

I could be tripping but I’m sure that is very similar to the Deepseek one from not long ago. Clearly I am too lazy to go and find it for verification.

jerryliu127 hours ago | parent | next

Personally feel like it could be more ambitious with what it creates.

csomar8 hours ago | parent | next

Where is the clear improvement on Fable 5? The tail is misplaced.

mercacona8 hours ago | parent | next

Why always sunny days?

umeshunni8 hours ago | root | parent

Pelicans hate biking in the rain (as do I).

gavinray7 hours ago | parent | next

Fable 5 xhigh actually looks the best to me.

benatkin3 hours ago | parent | next

The way they talked it up, having both legs on one side of the bike is like walking to the car wash

purple-leafy6 hours ago | parent | next

Do we need a pelican every single time a model is released? Beating a very dead horse.

Fun at first, seems disingenuous now. A site funnel

lacoolj3 hours ago | parent | next

dude, the max version looks like it's finally there. handle bar holding with wings, the left leg is behind the frame while the right is in front of it (correctly).

well done anthropic.

david_shi8 hours ago | parent | next

that's a great looking pelican

ge968 hours ago | parent | next

need more Alex Moulton style bikes

simunskxcsckss8 hours ago | parent | next

[flagged]

minimaxir8 hours ago | root | parent | next

You can't tell someone to "get a life" while taking the effort to create a burner account for the sole purpose of insulting someone.

rvz8 hours ago | root | parent | next

I don't really consider that a great benchmark anyway and we really need better ones that are objective instead of these mostly performative and cheatable and also available in the training set.

ilaksh8 hours ago | root | parent

Simon's pelicans are an institution. Are you trying to get banned. Lmao.

8 hours ago | root | parent | next

{"deleted":true,"id":48464266,"parent":48464188,"time":1781025870,"type":"comment"}

brazukadev8 hours ago | root | parent | next

For me it is like if crypto bros were allowed to shill their DAOs and tokens during the crypto/NFT phase.

He is the only person not getting rate-limited for shilling AI all the time.

simonw8 hours ago | root | parent

Pointing out how much the models still suck at drawing pelicans is a funny way to shill them.

toraway8 hours ago | root | parent

Tbf the first line of your first comment is:

  > Pelican for Fable 5 on default settings is a clear improvement on Opus 4.8

And doesn't contain any actual criticism within the comment (your blog post might, but just referring to what was posted on HN, which is a bit booster-y on its own).

simonw7 hours ago | root | parent

The entire pelican benchmark is a joke. The joke is that, for all of the billions of dollars poured into these things and the claims of PhD level intelligence, they still draw pelicans not-much-better than a five year-old would.

I don't spell that joke out in every comment I post here because that wouldn't be very funny.

rob8 hours ago | root | parent

I think it's a clever thing he did to basically guarantee he continues to get major traffic to his blog here every time a model is released, especially since he's taking sponsorships with a static banner at the top of every page now. I think he's trying to go the Daring Fireball route.

kylehotchkiss8 hours ago | parent

How many barrels of oil are burned per pelican at Fable levels?

#visit	13,691,555
#session	74,665
#live-session	0