When you hit a runtime bug, the agent's only tool is "let me add a print statement and restart". That works for simple cases but it's the exact same log-and-restart loop we fall back to in cloud and containerized environments, just with faster typing.
Where it breaks down: timing-sensitive code, Docker services, anything where restarting changes the conditions you need to reproduce.
I've had debugging sessions where the agent burned through 10+ restart cycles on a bug that would've been obvious if it could just watch the live values.
We've given agents the ability to read and write code. We haven't given them the ability to observe running code. That's a pretty big gap.
In fact last night I had it hacking away at a Wordpress template. It was making changes and then checking screenshots from a browser window automatically confirming it's changes worked as planned.