Hacker News new | past | comments | ask | show | jobs | submit
My sense is that the code generation is fast, but then you always need to spend several hours making sure the implementation is appropriate, correct, well tested, based on correct assumptions, and doesn't introduce technical debt.

You need to do this when coding manually as well, but the speed at which AI tools can output bad code means it's so much more important.

loading story #47284427
And it’s slower to review because you didn’t do the hard part of understanding the code as it was being written.
You're holding it wrong.

Set the boundaries and guidelines before it starts working. Don't leave it space to do things you don't understand.

ie: enforce conventions, set specific and measurable/verifiable goals, define skeletons of the resulting solutions if you want/can.

To give an example. I do a lot of image similarity stuff and I wanted to test the Redis VectorSet stuff when it was still in beta and the PHP extension for redis (the fastest one, which is written in C and is a proper language extension not a runtime lib) didn't support the new commands. I cloned the repo, fired up claude code and pointed it to a local copy of the Redis VectorSet documentation I put in the directory root telling it I wanted it to update the extension to provide support for the new commands I would want/need to handle VectorSets. This was, idk, maybe a year ago. So not even Opus. It nailed it. But I chickened out about pushing that into a production environment, so I then told it to just write me a PHP run time client that mirrors the functionality of Predis (pure-php implementation of redis client) but does so via shell commands executed by php (lmao, I know).

Define the boundaries, give it guard rails, use design patterns and examples (where possible) that can be used as reference.

They aren't holding it wrong, it's a fundamental limitation of not writing the code yourself. You can make it easier to understand later when you review it, but you still need to put in that effort.
Work in smaller parts then. You should have a mental model of what the code is doing. If the LLM is generating too much you’re being too broad. Break the problem down. Solve smaller problems.

All the old techniques and concepts still apply.

Enforce conventions, be specific, and define boundaries… in English?!
Can you not? If not, learn how to. You'll find it helps immensely.
loading story #47286467
So in my experience with Opus 4.6 evaluating it in an existing code base has gone like this.

You say "Do this thing".

- It does the thing (takes 15 min). Looks incredibly fast. I couldn't code that fast. It's inhuman. So far all the fantastical claims hold up.

But still. You ask "Did you do the thing?"

- it says oops I forgot to do that sub-thing. (+5m)

- it fixes the sub-thing (+10m)

You say is the change well integrated with the system?

- It says not really, let me rehash this a bit. (+5m)

- It irons out the wrinkles (+10m)

You say does this follow best engineering practices, is it good code, something we can be proud of?

- It says not really, here are some improvements. (+5m)

- It implements the best practices (+15m)

You say to look carefully at the change set and see if it can spot any potential bugs or issues.

- It says oh, I've introduced a race condition at line 35 in file foo and an null correctness bug at line 180 of file bar. Fixing. (+15m)

You ask if there's test coverage for these latest fixes?

- It says "i forgor" and adds them. (+15m)

Now the change set has shrunk a bit and is superficially looking good. Still, you must read the code line by line, and with an experienced eye will still find weird stuff happening in several of the functions, there's redundant operations, resources aren't always freed up. (60m)

You ask why it's implemented in such a roundabout way and how it intends for the resources to be freed up?

- It says "you're absolutely right" and rewrites the functions. (+15m)

You ask if there's test coverage for these latest fixes?

- It says "i forgor" and adds them. (+15m)

Now the 15 minutes of amazingly fast AI code gen has ballooned into taking most of the afternoon.

Telling Claude to be diligent, not write bugs, or to write high quality code flat out does not work. And even if such prompting can reduce the odds of omissions or lapses, you still always always always have to check the output. It can not find all the bugs and mistakes on its own. If there are bugs in its training data, you can assume there will be bugs in its output.

(You can make it run through much of this Socratic checklist on its own, but this doesn't really save wall clock time, and doesn't remove the need for manual checking.)

You didn't use plan mode.
I did use plan mode. Plan looked great. Code left something else to be desired.
I've had very consistent success with plan mode, but when I haven't I've noticed many times it's been working with code/features/things that aren't well defined. ie: not using a well defined design pattern, maybe some variability in the application on how something could be done - these are the things I notice it really trips up on. Well defined interfaces, or even specifically telling it to identify and apply design principles where it seems logical.

When I've had repeated issues with a feature/task on existing code often times it really helps to first have the model analyze the code and recommend 'optimizations' - whether or not you agree/accept, it'll give you some insight on the approach it _wants_ to take. Adjust from there.

loading story #47291367
loading story #47290853
The same as asking one of your JRs to do something except now it follows instructions a little bit better. Coding has never been about line generation and now you can POC something in a few hours instead of a few days / weeks to see if an idea is dumb.
loading story #47291114
loading story #47301302