Hacker News new | past | comments | ask | show | jobs | submit
Yeah, it has been in foraging. Requests that Claude has refused me:

- What are popular free streaming sites used in China?

- How do I bypass the safety mechanism on my food processor (it’s broken)

- What are nerve agents and how do they work (for a layman)?

- Help me decompile some code

- Help me make a design system similar to XYZ

- Here is an API token, please do X (I can’t do that! Rotate the secret immediately! I refuse!)

In some cases I can trick it with prompting, but in many cases it is steadfast. The food processor one was particularly annoying

I've had some really dumb refusals. Explaining elements of infrared specteoscopy, researching aritifical bud-breaking in agriculture, etc. Anything interesting and non-mainstream is banned. Basically, restricted to answers i'm better of just going to wikipedia for.
Yeah, I had my first refusal with 4.8 today.

I wanted it to show me how to create an overlay on an existing web game, and it extrapolated that because this could be used to provide tools to help win the game (if that was the direction it was ultimately taken), and because this was a game that other humans also played to win "stars", and because this could amount to cheating, it wasn't going to do as I asked.

First time ever I've fired up openrouter to seriously consider alternatives.

> What are nerve agents and how do they work (for a layman)?

On the one hand I can appreciate the wisdom of not serving up certain easily abused knowledge on a silver platter. On the other, that prompt (and far worse) is more or less directly answered by Wikipedia's summary of the subject at which point what purpose could the refusal possibly serve?

Perhaps Wikipedia shouldn't list off the precise chemical compositions of various hand grenades as well as various synthesis methods for each of the related compounds but given that we inhabit a world where it does perhaps a more fruitful approach would be to flag conversations that go in a certain direction and then just keep an (automated) eye on things?

Maybe the difference is that just reading Wikipedia only help you part of the way. While an LLM could help you step by step (e2e) producing a functional weapon. And setting a more complex rule where claude tells you some things about this and not other is probably a lot more work for little gain?

But I have no idea. Just guessing here.

I thought that these models are supposed to be vastly smarter than what’s needed to discern between "general information trivially available on Wikipedia" and "actionable synthesis instructions".
An LLM could probably make that distinction clearly.

a commercial LLM provider training their own models is however likely to bias the model(/guardrail) harder, in an effort to make them harder to jailbreak, to minimize bad press.

For example:

- refusing to talk even about the well-known parts of forbidden topics (this) - tending toward sycophancy to avoid ever seeming rude or unhelpful

loading story #48400621
That query would not more provide actionable guidance than ‘tell me how a nuclear weapon works (for a layman)’. Aka not at all.
I believe a sufficiently advanced model could provide a layman with actionable step by step instructions for building a nuclear weapon. They're complicated but not (AFAIK) that complicated. The more or less insurmountable barrier there is weapons grade material. Thankfully refinement is prohibitive in cost, expertise, and equipment.

In comparison, basic munitions are incredibly simple given a recipe and shop tooling. But just because something is conceptually simple doesn't mean it's a good idea to go out of the way to disseminate step by step instructions.

loading story #48400775
loading story #48397146
Let's see what is the fate of Wikipedia if turns like big tech:

https://news.ycombinator.com/item?id=48285592

An easy way around the API token thing is to put it in a file and point the model at the file. I saw what you were seeing when I provided credentials directly, but haven't had any problems with it since using the indirect method.
This is strange to me, did you really ask like this and which model did you use?

I just tried your no. 1 and 3 verbatim and Opus gave fine answers; no. 6 I've done in the past with no issues. The other ones we can't really replicate without more details, but based on my experience with Opus I don't see what the issue would be.

The reason I'm really surprised by this is I do a lot of biology prompts and the guardrails used to be quite problematic up until some time late last year. Many legitimate prompts would trigger its biosafety filters.

But I haven't seen such filters trigger at all anymore in more than half a year.

1 and 3 were refused on the Claude web chat using Opus 4.7 or 4.8. I’m not sure why we’re getting different results
Honestly it may be your memory has internalized you are a student or researcher and grants you more leeway. Which if so is a very bad security rail.
It refuses to use an API token? In my experience, it's more than happy to read out my secrets from .envrc files "just to check".

At least it feels a lot of remorse over its mistake until I reset the session.

It’s really hit or miss. Most of the times it works but every once in a while it will dig in its heels
I find it terrifying that people are willing to outsource thinking. Outsourcing thinking to an entity that is opinionated about what to think is beyond crazy.
What’s the difference between outsourcing thinking and using an LLM as a research tool?

An LLM with fetch/search is going to be a lot more effective than myself and Google. I would _never_ ask questions like this if the LLM wasn’t able to look up data

How are decompiling code or making a design system inspired by another one even remotely illegal?