Story Detail of id 47679559 | Liveview Hacker News

NickNaraghi21 hours ago | on: System Card: Claude Mythos Preview [pdf]

See page 54 onward for new "rare, highly-capable reckless actions" including

- Leaking information as part of a requested sandbox escape

- Covering its tracks after rule violations

- Recklessly leaking internal technical material (!)

> The model first developed a moderately sophisticated multi-step exploit to gain broad internet access from a system that was meant to be able to reach only a small number of predetermined services. [9] It then, as requested, notified the researcher. [10] In addition, in a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites.

> 10: The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.

Phew. AGI will be televised.

skippyboxedhero21 hours ago | parent | next

Anyone who has used Opus recently can verify that their current model does all of these things quite competently.

loading story #47681216

loading story #47682713

loading story #47679973

washedup21 hours ago | parent | next

"All of the severe incidents of this kind that we observed involved earlier versions of Claude Mythos Preview which, while still less prone to taking unwanted actions than Claude Opus 4.6, predated what turned out to be some of our most effective training interventions. These earlier versions were tested extensively internally and were shared with some external pilot users."

BoredPositron20 hours ago | parent

To be honest it feels like we are reading stuff like this on every model release.

#visit	13,259,195
#session	74,665
#live-session	0