Let's say I have daily backups, and get 10x done each day by being reckless and risking an "rm -rf", and let's say there's a 1% chance of an "rm -rf". I break even after 2 days of being reckless even if I get unlucky and on day 2 it wipes my drive. I spend day 3 and 4 recovering, and am still 6 days ahead based on the 10x work I got done on day 1.
What if I have a 50 day streak of not hitting an "rm -rf"? Early retirement?
I guess the work on day 1 should be to build a proper sandbox and drop the chance of an "rm -rf or worse" even down to 0.001%.
Your manager will look at your token usage and the number of Jira tickets you closed, and if you have not increased both 10x in the past year then you will be let go. 10x is the new 1x.
Whether that's early retirement depends on how much money you have.
> Additional bypass examples that all execute without permission:
> echo test ; git rm file.txt
> rm --force --recursive /home (if "rm -rf" is blocked)
I never really dug into the leaked code, but calling that there a security layer is a joke.
(And I really don't get why they give it actual shell access either, implementing a "fake" one for something like a honeypot takes a couple of days, not much more if it needs to persist/map to actual files.)
I've had one f up an account by placing 2000 limit orders at the wrong price, but that's another story.
I then saw it run `rm -r results/`, before messaging me: "Now all that's left is for you to upload the successful results, then I'll delete the rest!"
Why did it not upload the files itself, when it had been using the cloud storage CLI during that session? No clue. I do accept that I could have and should have just uploaded the file myself. It would have taken 3 seconds to type.
That happened to me once; I was running one of a few free-tier models in a pi-coding-agent session. The bash tool there is stateless and always begins from the launch directory, but the agent assumed state and executed `rm -rf .` intending to remove a build directory. Instead it removed the whole project tree, including session logs and notes.
This was mostly a matter of amusement for me since I was running the agent inside a bubblewrap sandbox for that very reason, and the project itself was not very important.