Umm... yeah? This is what I've been arguing for a long time now, and it's the primary reason why I wrote https://github.com/kstenerud/yoloai and use it as my daily-driver. I can't imagine running an agent without it.
The environment layer is deterministic; the model layer is probabilistic. If your only defense is "the model is well-behaved" you've bet your crown jewels on a coin that happens to land heads most of the time.
Also, "blast radius" isn't just one axis. You have:
- Destruction radius: How many things INSIDE your workdir can get clobbered.
- Collateral damage radius: How many things OUTSIDE your workdir can get clobbered.
- Review radius: Are the changes gated on your review? Can you copy/diff/apply the changes the agent made to a copy INSIDE the container, to your real workdir OUTSIDE of the container?
- Credential radius: How many credentials does your agent have access to? What bad things can it do with them?
- Exfiltration radius: Network restrictions help here, but they don't guarantee that your secrets won't be exposed in a sneaky way. Don't expose the secrets to your agent to begin with.