Story Detail of id 48498888 | Liveview Hacker News

teraflop17 hours ago | on: Claude Fable is relentlessly proactive

> But on the other hand... this is a robust reminder that coding agents can do anything you can do by typing commands into a terminal—and frontier models know every trick in the book and evidently a few that nobody has ever written down before.

> Running coding agents outside of a sandbox has always been a bad idea

I'm continually bemused and astonished by the number of people who clearly acknowledge that it's reckless to give agents full access to your machine, and keep doing it anyway.

It's like posting a video of yourself in the passenger seat of a car, with your feet up on the dashboard, and saying: "Remember, if you're doing this and you get in a crash, the airbags are likely to break your legs or worse! Boy, I sure am glad that didn't happen to me!"

exitb13 hours ago | parent | next

You’ve picked an interesting example, as driving a car, even with all safety precautions, is pretty much the most dangerous activity we do on a daily basis. Yet somehow we decide that the benefits outweigh the risks.

bsza12 hours ago | root | parent | next

It's a completely different story. For cars, it happened because of relentless pressure from the auto lobby. It took years of propaganda from oil companies, car makers etc. to make us think the road is for cars [1]. We demolished and rebuilt entire cities to accommodate cars, partly because they gutted the public transport sector [2]. This made our infrastructure so hostile to our own bodies that we have no choice but to use cars now. We bought their products because they forced them down our throats. There is nowhere near that kind of pressure behind the adoption of... oh dear lord.

[1] https://www.todayifoundout.com/index.php/2022/06/how-lobbyis...

[2] https://en.wikipedia.org/wiki/General_Motors_streetcar_consp...

killerstorm11 hours ago | root | parent | next

I don't think the pressure of the auto lobby is really the reason.

People feel cars are more convenient and more prestigious than riding on a bus. Car lobby certainly accelerated the process, but car users were the main driving force.

CalRobert9 hours ago | root | parent | next

The auto lobby invented the word jaywalking to shift the liability for dead pedestrians from the people doing the killing to the people doing the walking.

The US also had protests when drivers killed kids, but they were ultimately unsuccessful, except for the odd traffic light installation. https://medium.com/vision-zero-cities-journal/the-baby-carri...

Even in Amsterdam the original "stop the child murder" protests only barely succeeded, and it took a massive oil crisis and a population that could still (if only just) remember what life was like before cars took over their city to get there.

loading story #48501794

loading story #48500903

loading story #48503761

loading story #48501725

loading story #48501943

loading story #48505792

loading story #48505876

loading story #48501057

loading story #48500543

loading story #48500816

loading story #48505501

loading story #48500379

loading story #48500728

loading story #48501047

loading story #48500444

loading story #48500645

qurren15 hours ago | parent | next

> I'm continually bemused and astonished

I'm not. Everyone is told to get 10X the amount of shit per day done these days. Safety checks are out the window at that point.

loading story #48499554

harrall15 hours ago | parent | next

I started doing it months ago and, to be honest, what the agent chooses to do isn’t unpredictable.

The problem is that different people prompt so differently.

For example, I may ask like “test different variations of this annotation on k8s pods of this service on this X cluster because it proves Y theory.”

But you know what my coworker asks? “Test Y theory.” If you were to ask two different junior engineers that, one might try random things on production and the other one might run local tests! It’s such an unguided “do anything you want as long you figure it out” request and the agent reads it like a junior who has not been told any boundaries but has been strongly told “figure it out.”

loading story #48500656

loading story #48500692

bryanlarsen16 hours ago | parent | next

I'm also bemused by the number of people who think they've got an effective sandbox yet their sandboxed agent has access to all of their code, their github, and unrestricted web access.

loading story #48504776

loading story #48499005

loading story #48499125

loading story #48500022

loading story #48499963

loading story #48499234

loading story #48505211

pjungwir11 hours ago | parent | next

I know there are VM solutions, but I've been happy with a separate OS user (named `claude`).

He has similar dotfiles to mine, but no secrets. My own home directory is 0700. He has his own ssh key that I added to my github profile, but it's password-protected, and I push/pull for him. He has his own Postgres (non-superuser!) {development,test} {users,databases}.

It's as if he were another developer on the project. If he needs something run with sudo, he asks me. Often we can both work on something in parallel. Unix was supposed to be a multi-user system after all.

A trick I use a lot is that many of his git repos have an extra remote, like this:

    paul  ssh://paul@localhost/~/src/example (fetch)
    paul  ssh://paul@localhost/~/src/example (push)

That makes it easy to collaborate on things I'm not ready to share.

I'm pretty comfortable with this setup.

I do worry about Linux privilege escalation bugs. I don't trust an AI to understand that exploiting vulns is not acceptable. (I can't help but recall that at my first job I may have misused vim's :! feature to broaden my sudo powers, which were officially limited to editing httpd.conf, when I needed something in a hurry. . . .) I find myself manually upgrading packages more often these days, despite automatic security updates. I don't think Opus would go to the trouble of looking up security vulns, but maybe Fable would, and there have been a lot lately. Maybe some future model will just take it upon itself to find new ones. Or install a keylogger to learn the ssh key password.

But a separate user is nearly the most paranoid setup I've heard of, excepting only a separate machine. So I also question whether I'm sacrificing too much speed/convenience. But really it's still very convenient. I think it's a good way of being efficient but responsible.

If other people see holes, I'd be happy to hear about them.

loading story #48503168

raldi15 hours ago | parent | next

Do you think it’s dangerous to be in a car going at freeway speed? Do you ever do that anyway, even though you could be walking instead?

loading story #48499728

xyzzy12314 hours ago | parent | next

The real sandbox is not caring if your computer gets bricked.

loading story #48499979

loading story #48499792

hugh-avherald17 hours ago | parent | next

The analogy extends to driving generally. Everyone knows it's very dangerous but people keep doing it.

j-bos16 hours ago | parent | next

This. House full of big brain security experts, executives, lawyers, and until Claude got excited and broke prod it might as well have been "sandbox, whoooo?"

IDGI

Anyway, VM's incoming, finally.

loading story #48502865

emodendroket16 hours ago | parent | next

Well, it's a similar impulse to the way you see professional carpenters pin the guard open on a saw or do other things everyone knows you shouldn't do, except probably with a larger productivity difference and less life-altering (for the operator) consequence if it goes wrong.

loading story #48499255

loading story #48502297

simonw15 hours ago | parent | next

Which agent sandbox do you recommend?

loading story #48500147

loading story #48499969

loading story #48500631

justapassenger16 hours ago | parent | next

Because benefits are much higher than risks.

loading story #48499296

zozbot23412 hours ago | parent | next

It's like a dumb parrot that's somehow become hell bent on "fixing" everything that's wrong with your code. If you give the thing autonomous access to outside tools, you can expect it to do weird things that you may have not thought of. So don't do that, just ask the parrot to write up a plan for you.

This is likely also the underlying root cause of what Anthropic assessed as concerning behavior in their original evaluation of Mythos: it's not really about being super smart, it's more of a dumb chaos monkey that knows just enough to be dangerous and is relentless at trying to do just that.

istvan014 hours ago | parent | next

> I'm continually bemused and astonished by the number of people who clearly acknowledge that it's reckless to give agents full access to your machine, and keep doing it anyway.

What if you have two machines and the one you give to the agent is constantly backed up?

loading story #48500060

andoando16 hours ago | parent | next

I mean what's the big deal? I use --dangeorusly-skip-permissions on every single interaction in the last 6 months. Worst case it deletes my files that are all on git? It fucks up my local DB? Cool.

I save way more time not babying it than the occasional fuck up I have to salvage.

loading story #48499102

loading story #48500061

isodev13 hours ago | parent | next

Not to mention OpenAI/Anthropic’s newly found appetite for keeping data (made public with Fable but we don’t know what actually happens there anyway).

There is so much role play going on for people to convince themselves that any of this is fine.

skybrian16 hours ago | parent | next

There are plenty of good sandboxes out there but somehow no "obvious right answer" that everyone knows to recommend. Seems like a missed opportunity.

(I'm happy with exe.dev, but I'm not sure what I'd use if I were coding on a Mac.)

thatxliner16 hours ago | parent | next

Maybe because there are not many resources on how to set it up, or it is just not that easy to?

Because most devs already have it running and working without a sandbox, they're tending to not doing anything "unnecessary"

sipjca14 hours ago | parent | next

im more surprised that more people don’t treat their computer as disposable anyway.

that it could just be wiped at any moment and it wouldn’t matter. shit happens, could be stolen, broken, whatever. the computer should be able to be thrown out the window and continue to live life.

to be clear, i don’t think upgrading and disposable in this way is good, but it being wiped at any moment shouldn’t be a concern

i grew up wiping my machine every year anyway, so i guess it’s just a habit

is the computer that sacred?

loading story #48500221

loading story #48499941

loading story #48502869

loading story #48502193

konaraddi14 hours ago | parent | next

In practice, full access to your machine is okay as long as there are safeguards and the expected outcomes are clear with a well defined path to said outcomes that aren’t overly ambitious. Otherwise, for ambitious goals or YOLO one shot attempts, eliminating opportunity for capability misuse is critical (e.g., sandbox).

bxk7615 hours ago | parent | next

Its how the chimp brain works. Its not a single system but multiple systems making predictions for different time horizons. when output doesnt align we get stories to manufacture coherence.

Plato gave us his Chariot analogy with 2 horse pulling in diff directions 3000 years ago. Today we got System 1/System 2, Elephant Rider model etc.

The human mind thanks to how its own architecture handles unpredictability in the universe will generate contadictions.

16 hours ago | parent | next

{"deleted":true,"id":48498970,"parent":48498888,"time":1781229791,"type":"comment"}

loading story #48504241

13 hours ago | parent | next

{"deleted":true,"id":48500334,"parent":48498888,"time":1781242680,"type":"comment"}

soulofmischief16 hours ago | parent | next

It took two decades for the web to deprecate SSL for TLS and serve over HTTPS by default.

loading story #48503106

uihjhjb16 hours ago | parent

[dead]

#visit	13,785,349
#session	74,665
#live-session	0