No exaggeration it floundered for an hour before it started to look right.
It's really not good at tasks it has not seen before.
Given a harness that allows the model to validate the result of its program visually, and given the models are capable of using this harness to self correct (which isn't yet consistently true), then you're in a situation where in that hour you are free to do some other work.
A dishwasher might take 3 hours to do for what a human could do in 30 minutes, but they're still very useful because the machine's labor is cheaper than human labor.
I think some industries with mostly proprietary code will be a bit disappointing to use AI within.
Opus would probably do better though.
It basically just re-created the wikipedia article fleur-de-lis, which I'm not sure proves anything beyond "you have to know how to use LLMs"