Story Detail of id 48467551 | Liveview Hacker News

gck15 hours ago | on: Claude Fable 5

I'm not sure how the new guardrails work exactly, but I've read enough of reddit / Chinese communities focused on jailbreaking the models, to know that you either have to nerf it to the point where it fires even on "kill the task", or someone (maybe even other LLM) is going to come up with a set of tokens that is going to go around the defenses.

Nerfed models are really bad for PR, especially when you're staking your company's future on it being the smartest, most dangerous thing in the world.

So I believe they will ease up on nerfing/guardrails just enough that bad actors will find a way, while good ones will stay limited on anything dual-use. Just like such restrictions usually work in other places.

P.S. yes, "kill the task" did, in fact result in a refusal AND a warning on my claude account in Opus 4.8's early days.

#visit	13,691,601
#session	74,665
#live-session	0