Hacker News new | past | comments | ask | show | jobs | submit
First test question: "Is the UV Index a good proxy for when to wear sunglasses." Immediately triggered the safety filter ... oh dear.
It triggered for me when I asked "Web search for your own model card (released today) and pick out your favourite highlights from the pdf"
Did not trigger for me (Fable answered the question), so I guess the filters are either non-deterministic or are still being tweaked.
Interesting, I assumed all model-routing was done utilizing an LLM. (I.e. non-deterministic.)
It’s possible that there’s a set of words or phrases that route deterministically to save money on obvious stuff.

I kind of wonder, though, which model they’re using to do the routing. It seems like a huge added cost to do these kinds of checks on every request

Wasn't it leaked in the Claude Code source that it was all regex?
Iirc correctly Opus 4.7 had the same problem, safety filters were triggered way too easily at the beginning.
sunglasses _are_ safety filters