Story Detail of id 47053425 | Liveview Hacker News

zmmmmm1 hour ago | on: Claude Sonnet 4.6

I see a big focus on computer use - you can tell they think there is a lot of value there and in truth it may be as big as coding if they convincingly pull it off.

However I am still mystified by the safety aspect. They say the model has greatly improved resistance. But their own safety evaluation says 8% of the time their automated adversarial system was able to one-shot a successful injection takeover even with safeguards in place and extended thinking, and 50% (!!) of the time if given unbounded attempts. That seems wildly unacceptable - this tech is just a non-starter unless I'm misunderstanding this.

[1] https://www-cdn.anthropic.com/78073f739564e986ff3e28522761a7...

loading story #47054149

loading story #47054656

loading story #47053736

loading story #47054607

loading story #47054014

loading story #47053487

loading story #47053524

loading story #47053482

#visit	12,783,102
#session	74,664
#live-session	0