Hacker News new | past | comments | ask | show | jobs | submit
"Design for containment at the environment layer first, then steer behavior at the model layer. "

Umm... yeah? This is what I've been arguing for a long time now, and it's the primary reason why I wrote https://github.com/kstenerud/yoloai and use it as my daily-driver. I can't imagine running an agent without it.

The environment layer is deterministic; the model layer is probabilistic. If your only defense is "the model is well-behaved" you've bet your crown jewels on a coin that happens to land heads most of the time.

Also, "blast radius" isn't just one axis. You have:

- Destruction radius: How many things INSIDE your workdir can get clobbered.

- Collateral damage radius: How many things OUTSIDE your workdir can get clobbered.

- Review radius: Are the changes gated on your review? Can you copy/diff/apply the changes the agent made to a copy INSIDE the container, to your real workdir OUTSIDE of the container?

- Credential radius: How many credentials does your agent have access to? What bad things can it do with them?

- Exfiltration radius: Network restrictions help here, but they don't guarantee that your secrets won't be exposed in a sneaky way. Don't expose the secrets to your agent to begin with.