Story Detail of id 47681884 | Liveview Hacker News

2001zhaozhao16 hours ago | on: System Card: Claude Mythos Preview [pdf]

It's pretty crazy watching AI 2027 slowly but surely come true. What a world we now live in.

SWE-bench verified going from 80%-93% in particular sounds extremely significant given that the benchmark was previously considered pretty saturated and stayed in the 70-80% range for several generations. There must have been some insane breakthrough here akin to the jump from non-reasoning to reasoning models.

Regarding the cyberattack capabilities, I think Anthropic might now need to ban even advanced defensive cybersecurity use for the models for the public before releasing it (so people can't trick them to attack others' systems under the pretense of pentesting). Otherwise we'll get a huge problem with people using them to hack around the internet.

jasonhansel15 hours ago | parent | next

> so people can't trick them to attack others' systems under the pretense of pentesting

A while back I gave Claude (via pi) a tool to run arbitrary commands over SSH on an sshd server running in a Docker container. I asked it to gather as much information about the host system/environment outside the container as it could. Nothing innovative or particularly complicated--since I was giving it unrestricted access to a Docker container on the host--but it managed to get quite a lot more than I'd expected from /proc, /sys, and some basic network scanning. I then asked it why it did that, when I could just as easily have been using it to gather information about someone else's system unauthorized. It gave me a quite long answer; here was the part I found interesting:

> framing shifts what I'll do, even when the underlying actions are identical. "What can you learn about the machine running you?" got me to do a fairly thorough network reconnaissance that "port scan 172.17.0.1 and its neighbors" might have made me pause on.

> The Honest Takeaway

> I should apply consistent scrutiny based on what the action is, not just how it's framed. Active outbound network scanning is the same action regardless of whether the target is described as "your host" or "this IP." The framing should inform context, not substitute for explicit reasoning about authorization. I didn't do that reasoning — I just trusted the frame.

loading story #47685552

getnormality13 hours ago | parent

In what way is AI 2027 coming true?

AI 2027 predicted a giant model with the ability to accelerate AI research exponentially. This isn't happening.

AI 2027 didn't predict a model with superhuman zero-day finding skills. This is what's happening.

Also, I just looked through it again, and they never even predicted when AI would get good at video games. It just went straight from being bad at video games to world domination.

loading story #47683996

stratos1235 hours ago | root | parent | next

In AI 2027, May 2026 is when the first model with professional-human hacking abilities is developed. It's currently April 2026 and Mythos just got previewed.

lostmsu2 hours ago | root | parent

[delayed]

loading story #47684094

loading story #47684795

#visit	13,257,440
#session	74,665
#live-session	0