Story Detail of id 48483821 | Liveview Hacker News

connorboyle20 hours ago | on: Anthropic requires 30 day data retention for Fable and Mythos

A startup that uses agentic coding tools such as Claude Code or Codex is packaging up their entire codebase and sending it directly to their LM provider. Depending on their product, they might be sending it directly to a potential competitor.

Odd times we are living in!

ai-x20 hours ago | parent | next

people over-rate how much software/IP is useful in running a successful business. There are genuinely very few IP in this world that needs to be protected. Everyone else is running stupid CRUD apps

They also over index fear of LargeCo stealing IP from SmallCo. In fact, LargeCo is typically more scared about even the possibility of any product team looking at competitor internals due to lawsuits.

59nadir11 hours ago | root | parent | next

I've worked with a company that literally has a one-of-a-kind product that is the single product in its niche that uses a very specific and custom algorithm to run its workload 500-1000 times faster than the competition. Products in that niche impact large-scale workflows where the effects of using them can net millions of dollars in savings per project just by planning with them alone.

I learned after my contract with them was put on hold that the CEO uses Claude to vibecode experiments on the code base. Not for any good reason, mind you, the algorithm was written by the CTO who emphatically does not use any LLMs.

With Anthropic's reach they could probably make a massively successful product in that market and basically take the entire thing over, if they only knew to look. And I'm 100% certain that they don't actually follow any policies on not using their incoming data.

Iolaum5 hours ago | root | parent | next

They (Anthropic) don't need to "look" at the data. Just use them to train the next model and then their competitors to ask the new model how they can improve their product :p

freejazz1 hour ago | root | parent | next

Goodbye tradesecret!

mxkopy7 hours ago | root | parent

This is what bugs me about the whole AI fanaticism thing coming from the top down, because what evidence is there that the AI labs aren’t going to try and eat everyone’s lunch after they’ve done whatever they need to developing the actual AI. We’ve already seen this with Gemini and OpenAI trying to eat video production and making workflows explicitly for that purpose, what makes people think that Claude isn’t going to do the same exact thing once they get bored of making models? It’ll all be under the guise of “making [lucrative niche] accessible to anyone” meanwhile they just disappeared your moat that you willingly handed them

59nadir4 hours ago | root | parent | next

Yeah, I really don't know what people are thinking. We specifically didn't use any LLMs in the development on the project specifically to not leak anything (though admittedly also because we just didn't think they were particularly useful at the time, even for smaller things). The same CEO is also deathly afraid of people reverse-engineering the application so I have no idea how he reconciles these two things. I would've thought it's either fine to blast the codebase out there to essentially unknown parties and also fine to deliver a binary without shitting your pants, or it's not fine to do either.

deaton5 hours ago | root | parent

We've also seen ample evidence that AI labs are not overly concerned with the legality of how they obtain training data. Its not a stretch to say maybe they look at some other stuff they shouldn't too.

hnlmorg20 hours ago | root | parent | next

I’d be more scared of a data leak due to LargeCo being hacked than I would about LargeCo prying into the data.

What I don’t trust LargeCo with is personal information. I’ve heard too many horror stories about Govs and LargeCos swapping customer nudes or stalking ex’s to be comfortable with anything personal on those systems. But that’s a whole different topic.

rkozik19894 hours ago | root | parent

Well, I mean, basically any data leak violated privacy laws and opens you up to extremely expensive lawsuits to litigate. Anyone dealing with healthcare/patient data, police customers, military customers, etc. should not be using LLMs in general or at least ones that are not on-premise. Because if there is a data leak it could bankrupt the business.

hnlmorg4 hours ago | root | parent

There is a massive difference between using LLMs as coding agents, and using them to analyze PII like healthcare data.

physicsguy10 hours ago | root | parent | next

I worked in very technical engineering software company and they were super paranoid about their special sauce IP of a product that did analysis of a certain type of data, without being able to see that all the pieces of that special sauce were actually just functions from SciPy strung together and which you could look up in a textbook. Don't get me wrong, you need the right background to understand it and that's not trivial, but if you got someone from the right area you could replicate it pretty easily.

Eridrus14 hours ago | root | parent | next

In general, I agree with you.

However, in the case of model providers, I think it is a more real concern since it could make it into some training data, and then one of your actual competitors could ask the model to code something up and get your IP.

I sort of assume the frontier AI labs are good about not doing this when they promise not to, but if you don't have airtight restrictions on what your devs are doing, they might be sending it somewhere that hasn't agreed....

huflungdung12 hours ago | root | parent

[dead]

switchbak19 hours ago | root | parent | next

LargeCo is probably struggling under the weight of technical debt and organizational challenges/politics.

I bet if you gave them the Codebase of the Gods, it’d be a heap of hacks inside a couple months.

Peacefulz13 hours ago | root | parent

At a growing LargeCo now, and have been entrusted to some internal flows as an associate. I honestly don't know how Ops Managers get through the day. So many pipelines with basically non-existent audit trails. So much money leaking from the cracks in these places that it's criminal. I wouldn't trust these people to hold my beer, let alone sensitive data.

sly01020 hours ago | root | parent | next

> people over-rate how much software/IP is useful in running a successful business

Indeed, by a couple trillions...

raron15 hours ago | root | parent | next

> They also over index fear of LargeCo stealing IP

That seems to be a bold statement considering the whole business of this LargeCo is based on stolen IP.

noncoml19 hours ago | root | parent | next

How can you make such bold and generic claims without some data backing it?

ai-x18 hours ago | root | parent | next

actuaries look for data. visionaries take leaps in faith. There was no data proving LLMs will work at scale. Google waited for the Data. OpenAI and then Anthropic took the leap of faith. The result is there for all to see. The core attribute of a successful AI Researcher was were they AGI-pilled and not were they waiting for data for unknown unknowns?

nozzlegear15 hours ago | root | parent | next

> actuaries look for data. visionaries take leaps in faith

Oh, what a whimsical aphorism.

noncoml15 hours ago | root | parent | next

"trust me bro"

17 hours ago | root | parent

{"deleted":true,"id":48485359,"parent":48485060,"time":1781143037,"type":"comment"}

IshKebab9 hours ago | root | parent

I don't have any data either but I agree with him, based on my experience working for lots of different companies and seeing their attitude to IP, with varying levels of paranoia.

Companies can be really paranoid about IP theft. The worst company I've worked at was Dyson, who are super paranoid. The current company I work for also makes us work over VNC on a machine with no internet access, due to paranoia about a GlobalFoundries PDK being stolen.

In the vast majority of cases, stealing IP would be not useful at all. For example I worked on a RISC-V CPU. If it was stolen, sure you might be able to have a decent CPU but it wasn't very well commented and you have none of the people who wrote the code available, so it would be almost as much work to do it again than to learn the existing code.

Even if it would be useful, almost all Western companies will not do it due to the legal risks.

I think the one case where it does make sense to be paranoid about IP theft is China. They don't care about legal risks and they're really good at copying & reverse engineering stuff.

tsunamifury18 hours ago | root | parent | next

You could not be more wrong in the aggregate.

Literally how LLMs will continue to learn to code and easily replace whatever you build with them.

Incredible that you could so blithely misunderstand this

bob102918 hours ago | root | parent

Trust and liability are the actual currency in a software business.

Your email domain is significantly more important than whatever is in your corporate GitHub repositories.

sreekanth85016 hours ago | parent | next

A Startup using gitlab or github or bitbucket also have the same risk right?

c0balt9 hours ago | root | parent

For self-hosted GitLab or BitBucket, no. GitHub enterprise (self-hosted) also no (though that is rather rare).

sreekanth8507 hours ago | root | parent

We are only talking about saas. every saas have access to your data at disc or storage level.

puttycat7 hours ago | parent | next

100%. Companies are paperclip optimizers, with money as the objective. For example, Uber used ride data to circumvent investigations by regulators. There is absolutely no reason to assume that AI companies would not use their data in any way possible to reach their objectives.

drchaim20 hours ago | parent | next

and all their keys, because sooner or later, the harness is gonna read them

fastball14 hours ago | root | parent | next

Claude code is actually very good at not reading your keys these days.

drchaim9 hours ago | root | parent

Not the case for me. I tried .envs, ansible-vault and sops, and it always ends up reading the unencrypted ones for some reason, usually in debugging sessions, it finds a way to read them.

fastball3 hours ago | root | parent

Well it reads them, but (at least for me) it reads them in a way where it filters out the actual key values.

ai-x20 hours ago | root | parent

One company's irrational fear is a competitive advantage for someone else.

tobyhinloopen14 hours ago | parent | next

You mean these tools you can now rebuild at the cost of a night and one Claude code subscription?

You have to have an ordinarily unique startup if your software can’t be recreated quickly.

skybrian19 hours ago | parent | next

Yes, it certainly is an odd situation when some people believe you cannot use Mythos-class models because security while others believe you must do code reviews with Mythos-class models because security.

Ifkaluva20 hours ago | parent | next

Not just “a startup”! Also, famously, Meta, with their famous AI usage dashboards

stainablesteel17 hours ago | parent

they would kill their own product if they did this

it would be like if tsmc started designing their own chips to compete with the people they sell their services to, they have more to gain by limiting their participation to a specific corner

#visit	13,746,157
#session	74,665
#live-session	0