Yeah, you've exactly captured one of the main problems with the model being relentlessly proactive: it will happily burn like $5 of tokens to avoid asking the human to take a screenshot or click a button for it.
I'm actually very happy about this. Babysitting the agent just in case it needs me to do something is a terrible use of my time. I've always had to be very explicit about the various ways that it can get an automated feedback loop going to check its work, and now Fable doesn't even need that hand holding. Really great improvement all around.
loading story #48500696
Have you tried instructing it not to do that? Something like "do not branch into side projects or hacky solutions to obtain information you could ask me for. For example: if you need a screenshot of the issue, just ask me to take a screenshot rather than find a way to reproduce and screenshot it."
I used to complain about all the levels of indirection of modern software, running in a javascript jit, in a browser container, in a vm, on an os, etc.
I eventually just accepted it, but this new agent layer really takes things to a new level.
Ha, you just gave me an idea. Add to the prompt “do not do things that will burn over X tokens if the human operator can do it in less than X min, ask for it”.
I wonder if LLMs can estimate effort in tokens?
loading story #48501479
Honestly Claude straight up ignores my input sometimes, preferring to instead run commands for output and processing that and burning through a series of tokens when thinking hard about whether to ignore me.
Like today, I told Claude exactly the name of the folder it had mistaken (it was supposed to be prod, not production), and it disregarded my input to then examine the directory itself. Small example of the kind of things it's been doing lately but that's top of mind.
loading story #48500286