Hacker News new | past | comments | ask | show | jobs | submit
Didn’t they all but admit they’ve been storing and actively looking at requests with this post: https://www.anthropic.com/news/detecting-and-preventing-dist... ?

If they weren’t storing, they’d be oblivious to what customers are doing, making this kind of detection impossible. What data did they train their classifier on, if not real user (distiller) traffic?

{"deleted":true,"id":48485347,"parent":48485303,"time":1781142917,"type":"comment"}
Why can’t they have trained the classifier on internal red teaming?
They basically said "Deepseek ran 150,000 requests and here's the gist of one of their prompts". Anthropic doesn't know which accounts are Deepseek proxies beforehand, so definitely sounds like retrospective analysis of broad user logs to me.

Of course Anthropic realizes saying this straight is problematic so they said they examined request metadata, but no, I don't think they can get this kind of insight from metadata (token counts, request time, etc.)