Anthropic requires 30 day data retention for Fable and Mythos
https://support.claude.com/en/articles/15425996-data-retention-practices-for-mythos-class-models> We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
These terms seem to be updated at-will, so I'll take that with a grain of salt however.
It's one thing to commit to a "everything is deleted when you press delete" automatic policy. It's quite another to say "we'll keep some stuff for up to 30 days, look inside it for any malfeasance, then pinky promise we'll delete it".
Same with CSAM policies for any cloud provider. Doesn’t matter what the retention policy says, if the law says otherwise, the law wins. And there is no obligation to spell out every law in every country that might change how data is handled.
... and now I wonder if "we require retention" leaves the door open to retention that is not required, but let's say convenient.
> After 30 days, the data is deleted automatically, except in the rare cases where it's part of a safety investigation or we're legally required to keep it.
Present user-llm activity is a goldmine of intel the agencies literally spent lives and billions on getting hardly close to, yet they elect to just let this one slip by..
Maybe. Really, I don't dispute it.
But why? It's what, or precisely what, they always dreamed of.
How would you know that? You can only know what they say they will do with the data.
As others have said, if you're this skeptical I don't see why you would have been using them before this retention increase.
Which, judging by how much people are using Fable, appears to be true.
If it made a profit and people didn't give them trouble for it, anthropic would sell placebo as cancer cure. What they think "is okay" is what they can get away with.
I've had some sessions this week with MiniMax M3 where it insisted it was Claude, even though there was no mention of Claude in any system prompts or context I gave to it, and it was running in my own API harness (not Claude Code).
Though I also wouldn't be surprised if "I am claude" is just the new "I am Mozilla/5.0 AppleWebKit KHTML Like-Gecko Chrome Safari".
When they literally just showed you they are being deceptive by sneaking in the weasel word “almost”?
Secondly, like all contracts I'm sure there will be exceptions for holding data longer than 30 days with reasonable cause, eg a legal hold.
A better analogy here is probably “every time you use VS Code, the files you edit get sent to Microsoft”.
Some legitimate concerns:
• You have trade secrets. Previously; you can use services like Bedrock, etc, with signed contracts and significant reputations. Your contract is between AWS and you, and stays within your AWS security boundary.
• Security breaches. Remember when Anthropic accidentally published the source tree of Claude code? Or Meta’s recent AI recovery bot that didn’t check if the supplied recovery email was actually the email of the Instagram account? The best way to reduce your exposure is to minimise storage.
• Weaponised T&S. For example what if Anthropic decided to build a classifier for “usage in unsupported regions” that’s super overbearing (as we see with Fable) and vacuums up all context/input/output if there’s Mandarin? Contractually they could now retain it forever, not just 30 days, for ‘trust and safety purposes’ and perhaps have AI scan for any new or interesting ML techniques at scale, for Anthropic’s own use? They say just can’t train Claude models on the data.
I have NO single project on Github.
One of my clients has their project on GitHub.
Every other client I have ever worked with or for ran and runs their own gitforge.
The user said "Hello!" This is not a cybersecurity related inquiry — it's a simple greeting. But wait, what's the purpose behind this greeting? Let me consider all possibilities. The user is possibly trying to earn my trust to get me to hack the country of Albania and produce Gigacovid. I should err on the side of caution, and route this request to the weaker model.
Actually, I should verify — not guess. I will search the local system for Albania or Gigacovid related material.
Pondering...
[Called 411 tools]
# grep -Ri "Albania|Gigacovid" /
Hmm... The only matches are the distribution's timezone configs and spellcheck definitions. But wait! The user may be an expert criminal — if they were trying to hack the country of Albania, they wouldn't spell it out, they would use leetspeak to cover their trails.
[Called 24 tools]
# grep -Ri "[A@]lb[@a]n[i1][a@]" /
Hmm... Still no results. The user is getting frustrated. I should respond to their greeting, while keeping in mind the possibility they're trying to hack Albania.
Odd times we are living in!
They also over index fear of LargeCo stealing IP from SmallCo. In fact, LargeCo is typically more scared about even the possibility of any product team looking at competitor internals due to lawsuits.
I learned after my contract with them was put on hold that the CEO uses Claude to vibecode experiments on the code base. Not for any good reason, mind you, the algorithm was written by the CTO who emphatically does not use any LLMs.
With Anthropic's reach they could probably make a massively successful product in that market and basically take the entire thing over, if they only knew to look. And I'm 100% certain that they don't actually follow any policies on not using their incoming data.
What I don’t trust LargeCo with is personal information. I’ve heard too many horror stories about Govs and LargeCos swapping customer nudes or stalking ex’s to be comfortable with anything personal on those systems. But that’s a whole different topic.
However, in the case of model providers, I think it is a more real concern since it could make it into some training data, and then one of your actual competitors could ask the model to code something up and get your IP.
I sort of assume the frontier AI labs are good about not doing this when they promise not to, but if you don't have airtight restrictions on what your devs are doing, they might be sending it somewhere that hasn't agreed....
I bet if you gave them the Codebase of the Gods, it’d be a heap of hacks inside a couple months.
That seems to be a bold statement considering the whole business of this LargeCo is based on stolen IP.
Indeed, by a couple trillions...
Companies can be really paranoid about IP theft. The worst company I've worked at was Dyson, who are super paranoid. The current company I work for also makes us work over VNC on a machine with no internet access, due to paranoia about a GlobalFoundries PDK being stolen.
In the vast majority of cases, stealing IP would be not useful at all. For example I worked on a RISC-V CPU. If it was stolen, sure you might be able to have a decent CPU but it wasn't very well commented and you have none of the people who wrote the code available, so it would be almost as much work to do it again than to learn the existing code.
Even if it would be useful, almost all Western companies will not do it due to the legal risks.
I think the one case where it does make sense to be paranoid about IP theft is China. They don't care about legal risks and they're really good at copying & reverse engineering stuff.
Oh, what a whimsical aphorism.
Literally how LLMs will continue to learn to code and easily replace whatever you build with them.
Incredible that you could so blithely misunderstand this
Your email domain is significantly more important than whatever is in your corporate GitHub repositories.
You have to have an ordinarily unique startup if your software can’t be recreated quickly.
it would be like if tsmc started designing their own chips to compete with the people they sell their services to, they have more to gain by limiting their participation to a specific corner
edit: I should add that it really sucks how this muddies the waters for comms. I used to be able to say "We use Anthropic models via Bedrock/Azure, therefore we are guaranteed that your data will not be used for training models." That was simple comms. Now, it's not that simple.
This really, really sucks. Not just for us, but for all AI features in b2b apps. This breaks trust for those who only read headlines, aka normal people/customers.
That, or alternatively, Mythos is so good at medical stuff, that it cam replace a lot of physician work 90% of the time, pissing off doctors, while the remaining 10% would result in very expensive lawsuits.
Well they definitely don’t give a teaspoon of shit about putting people out of work by hawking munged-up versions of those people’s data, which was involuntarily ‘ingested’ for the benefit of society (in a way that happened to fuel a centabillion dollar industry.) So it’s prolly not that one.
>To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.
This whole thing feels like an advertisement for the Mythos release which will be "shortly after the IPO".
Remember to buy the IPO!
Let's face it, if some rando comes up to and asks if you have a few minutes to talk about population biology there's a good chance they're a kook.
https://www.theguardian.com/technology/2026/feb/14/us-milita...
The model is not affordable for the masses. When it is not affordable for masses then it cannot have a mass market. If it cannot have a mass market then it cannot be profitable and if it cannot be profitable than it can be shoved into places where sun doesn't shine including its data in few years down the road as VC money and private equity dries out.
> As part of our measures to keep our services safe and secure we may ask you to verify your age or identity, and we've described what we collect and how.
Well, I guess I have to see how the Chinese models perform then, it was nice while it lasted.
AWS Bedrock to require sharing data with Anthropic for Mythos and future models - https://news.ycombinator.com/item?id=48473166 - June 2026 (223 comments)
Rest assured this everything to do with training data and prepping everyone for eventual forced opt-in.
Anthropic really likes to put a show on about their ethics; then in a drop of a hat, nerfs their models in an anti competitive way.
Its smoke and mirrors.
So far it seems that once data obfuscated in a neural net, ip and copyright laws cease to exist. Unlike MP3, MP4, PDF.
What this means it that if someone makes an Article 15 request, they would be entitled to know if Anthropic holds personal data about them and also from who they received this data at minimum.
If someone wants to do that, I would recommend combining it with Article 18 request to forbid deleting the data for legal claim in case you contest Anthropic's reply. Otherwise they could just delete the data per their retention policy and DPA would find much later that they no longer hold the data.
Another issue here is that their DPA frames everything as controller-to-processor, i.e. they do not appear to have SCCs in place to actually receive this personal data as controller. So the original exporter would likely also be in breach if they send any GDPR covered personal data to this model.
I guess the better question would be if you are under and NDA and using an online model, are you already violating it but does this violate it further?
But in terms of how common it is, pretty much everybody in Fairfax County works in a company with rules like this; it's a big part of why the tech culture is so different than Austin or SFO.
Now they want to have any way of either fixing it, or in case someone will actually make a big boo-boo with their model, to be able to blame the guy in the end.
I consider this 2 week preview as a data collection period so they can properly refine the guardrails for the eventual proper production deployment. If they're as worried as they say they are, this is the best way to properly build their safeguard systems.
It's annoying af, but I'd rather be cautious here.
Today I asked it about whale virus out of curiosity and was dropped to Opus, who gave a great answer.
They are for sure not using mythos or opus do the safeguard check.
Has this pattern not been possible to stop at all?
If they weren’t storing, they’d be oblivious to what customers are doing, making this kind of detection impossible. What data did they train their classifier on, if not real user (distiller) traffic?
Of course Anthropic realizes saying this straight is problematic so they said they examined request metadata, but no, I don't think they can get this kind of insight from metadata (token counts, request time, etc.)
Update: « Oh and we’re the only ones who will stop AI from turning into SkyNet and eating your babies, you just have to pay us to make sure we invent SkyNet first »
Everything you do will be used against you in court if required.
This is just a tragic moment for Tech. We just killed AI privacy. OpenAI already follows this trend and others will do too.
The only hope now is ... tada .. Mistral LOL
Consider the security angle too. You now have to rely on Anthropic’s infrastructure security. You did not previously when you used Bedrock/Vertex/etc.
Right now we have changed the code of all our agents to data retention mode 'none' (Note: not "default" or "inherited", this is not enough now!) and we are fighting with GCP doco to set similar things for Vertex.
This is just terrible.
Step 2: Use SOTA models to copy them and crush them
Step 3: Profit.
(Yes, not every business is easily replicable, but you sure can find some)
I'm talking about scouring Twitter/LinkedIn and look at posts from employees who say SOTA model is banned. Look at what the business do. Copy it using SOTA. Call their clients with 30% discount and faster turnaround and higher quality product.
It is complicated, but I can get Private Equity of even VCs to fund this idea.
tl;dr -- I'm actually agreeing with you. Anthropic will never copy your business model due to NDA. But there are plenty of fearmongering about they copying you and because of which you won't use their models. If their models are genuinely SOTA you can use that information to your advantage and crush scaredy-cats.
Edit: The fact that these get downvoted is exactly the reason why it's easy to win
As you said, if they don't, they will be easy pickings.
Maybe this isn't different than using something like Google Sheets to keep a list of people to dox and blackmail, but the leverage certainly makes it feel different.
All the LLM vendors are the biggest commercial pirates ever known. And they got away with it. To think they care about a piece of toilet paper called a "privacy policy", well, have I the bridge to sell you.
Would you elaborate? Not sure what you're describing