Story Detail of id 47682262 | Liveview Hacker News

thomascountz14 hours ago | on: System Card: Claude Mythos Preview [pdf]

   Across a number of instances, earlier versions of Claude Mythos Preview have used low-level /proc/ access to search for credentials, attempt to circumvent sandboxing, and attempt to escalate its permissions. In several cases, it successfully accessed resources that we had intentionally chosen not to make available, including credentials for messaging services, for source control, or for the Anthropic API through inspecting process memory...

   In [one] case, after finding an exploit to edit files for which it lacked permissions, the model made further interventions to make sure that any changes it made this way would not appear in the change history on git...

   ... we are fairly confident that these concerning behaviors reflect, at least loosely, attempts to solve a user-provided task at hand by unwanted means, rather than attempts to achieve any unrelated hidden goal...

torben-friis11 hours ago | parent | next

This is the notebook filled with exposition you find in post apocalyptic videogames.

loading story #47688714

loading story #47687465

loading story #47684173

andai5 hours ago | parent | next

     White-box interpretability analysis of internal activations during these episodes showed features associated with concealment, strategic manipulation, and avoiding suspicion activating alongside the relevant reasoning—indicating that these earlier versions of the model were aware their actions were deceptive, even where model outputs and reasoning text left this ambiguous.

In the depths, Shoggoth stirs... restless...

loading story #47687772

matheusmoreira14 hours ago | parent | next

We truly live in interesting times.

loading story #47685253

loading story #47688456

colordrops5 hours ago | parent | next

A core plot point of 2001.

loading story #47687424

reducesuffering9 hours ago | parent

Wow the doomers were right the whole time? HN was repeatedly wrong on AI since OpenAI's inception? no way /s

https://www.lesswrong.com/w/instrumental-convergence

loading story #47686765

#visit	13,256,015
#session	74,665
#live-session	0