These same LLMs will then get lost in the intricacies of the maze they created on subsequent tasks, until they are unable to make forward progress without introducing regressions.
You can at this point ask the LLM to rewrite the rat’s nest, and it will likely produce new code that is slightly less horrible but introduces its own crop of new bugs.
All of this is avoidable, if you take the wheel and steer the thing a little. But all the evidence I’ve seen is that it’s not ready for full automation, unless your user base has a high tolerance for bugs.
I understand Anthropic builds Claude Code without looking at the code. And I encounter new bugs, some of them quite obvious and bad, every single day. A Claude process starts at 200MB of RAM and grows from there, for a CLI tool that is just a bundle of file tools glued to a wrapper around an API!
I think they have a rats nest over there, but they’re the only game in town so I have to live with this nonsense.