Hacker News new | past | comments | ask | show | jobs | submit
A startup that uses agentic coding tools such as Claude Code or Codex is packaging up their entire codebase and sending it directly to their LM provider. Depending on their product, they might be sending it directly to a potential competitor.

Odd times we are living in!

people over-rate how much software/IP is useful in running a successful business. There are genuinely very few IP in this world that needs to be protected. Everyone else is running stupid CRUD apps

They also over index fear of LargeCo stealing IP from SmallCo. In fact, LargeCo is typically more scared about even the possibility of any product team looking at competitor internals due to lawsuits.

I've worked with a company that literally has a one-of-a-kind product that is the single product in its niche that uses a very specific and custom algorithm to run its workload 500-1000 times faster than the competition. Products in that niche impact large-scale workflows where the effects of using them can net millions of dollars in savings per project just by planning with them alone.

I learned after my contract with them was put on hold that the CEO uses Claude to vibecode experiments on the code base. Not for any good reason, mind you, the algorithm was written by the CTO who emphatically does not use any LLMs.

With Anthropic's reach they could probably make a massively successful product in that market and basically take the entire thing over, if they only knew to look. And I'm 100% certain that they don't actually follow any policies on not using their incoming data.

They (Anthropic) don't need to "look" at the data. Just use them to train the next model and then their competitors to ask the new model how they can improve their product :p
Goodbye tradesecret!
This is what bugs me about the whole AI fanaticism thing coming from the top down, because what evidence is there that the AI labs aren’t going to try and eat everyone’s lunch after they’ve done whatever they need to developing the actual AI. We’ve already seen this with Gemini and OpenAI trying to eat video production and making workflows explicitly for that purpose, what makes people think that Claude isn’t going to do the same exact thing once they get bored of making models? It’ll all be under the guise of “making [lucrative niche] accessible to anyone” meanwhile they just disappeared your moat that you willingly handed them
Yeah, I really don't know what people are thinking. We specifically didn't use any LLMs in the development on the project specifically to not leak anything (though admittedly also because we just didn't think they were particularly useful at the time, even for smaller things). The same CEO is also deathly afraid of people reverse-engineering the application so I have no idea how he reconciles these two things. I would've thought it's either fine to blast the codebase out there to essentially unknown parties and also fine to deliver a binary without shitting your pants, or it's not fine to do either.
We've also seen ample evidence that AI labs are not overly concerned with the legality of how they obtain training data. Its not a stretch to say maybe they look at some other stuff they shouldn't too.
I’d be more scared of a data leak due to LargeCo being hacked than I would about LargeCo prying into the data.

What I don’t trust LargeCo with is personal information. I’ve heard too many horror stories about Govs and LargeCos swapping customer nudes or stalking ex’s to be comfortable with anything personal on those systems. But that’s a whole different topic.

Well, I mean, basically any data leak violated privacy laws and opens you up to extremely expensive lawsuits to litigate. Anyone dealing with healthcare/patient data, police customers, military customers, etc. should not be using LLMs in general or at least ones that are not on-premise. Because if there is a data leak it could bankrupt the business.
There is a massive difference between using LLMs as coding agents, and using them to analyze PII like healthcare data.
I worked in very technical engineering software company and they were super paranoid about their special sauce IP of a product that did analysis of a certain type of data, without being able to see that all the pieces of that special sauce were actually just functions from SciPy strung together and which you could look up in a textbook. Don't get me wrong, you need the right background to understand it and that's not trivial, but if you got someone from the right area you could replicate it pretty easily.
In general, I agree with you.

However, in the case of model providers, I think it is a more real concern since it could make it into some training data, and then one of your actual competitors could ask the model to code something up and get your IP.

I sort of assume the frontier AI labs are good about not doing this when they promise not to, but if you don't have airtight restrictions on what your devs are doing, they might be sending it somewhere that hasn't agreed....

LargeCo is probably struggling under the weight of technical debt and organizational challenges/politics.

I bet if you gave them the Codebase of the Gods, it’d be a heap of hacks inside a couple months.

At a growing LargeCo now, and have been entrusted to some internal flows as an associate. I honestly don't know how Ops Managers get through the day. So many pipelines with basically non-existent audit trails. So much money leaking from the cracks in these places that it's criminal. I wouldn't trust these people to hold my beer, let alone sensitive data.
> people over-rate how much software/IP is useful in running a successful business

Indeed, by a couple trillions...

> They also over index fear of LargeCo stealing IP

That seems to be a bold statement considering the whole business of this LargeCo is based on stolen IP.

How can you make such bold and generic claims without some data backing it?
actuaries look for data. visionaries take leaps in faith. There was no data proving LLMs will work at scale. Google waited for the Data. OpenAI and then Anthropic took the leap of faith. The result is there for all to see. The core attribute of a successful AI Researcher was were they AGI-pilled and not were they waiting for data for unknown unknowns?
> actuaries look for data. visionaries take leaps in faith

Oh, what a whimsical aphorism.

{"deleted":true,"id":48485359,"parent":48485060,"time":1781143037,"type":"comment"}
I don't have any data either but I agree with him, based on my experience working for lots of different companies and seeing their attitude to IP, with varying levels of paranoia.

Companies can be really paranoid about IP theft. The worst company I've worked at was Dyson, who are super paranoid. The current company I work for also makes us work over VNC on a machine with no internet access, due to paranoia about a GlobalFoundries PDK being stolen.

In the vast majority of cases, stealing IP would be not useful at all. For example I worked on a RISC-V CPU. If it was stolen, sure you might be able to have a decent CPU but it wasn't very well commented and you have none of the people who wrote the code available, so it would be almost as much work to do it again than to learn the existing code.

Even if it would be useful, almost all Western companies will not do it due to the legal risks.

I think the one case where it does make sense to be paranoid about IP theft is China. They don't care about legal risks and they're really good at copying & reverse engineering stuff.

You could not be more wrong in the aggregate.

Literally how LLMs will continue to learn to code and easily replace whatever you build with them.

Incredible that you could so blithely misunderstand this

Trust and liability are the actual currency in a software business.

Your email domain is significantly more important than whatever is in your corporate GitHub repositories.

A Startup using gitlab or github or bitbucket also have the same risk right?
For self-hosted GitLab or BitBucket, no. GitHub enterprise (self-hosted) also no (though that is rather rare).
We are only talking about saas. every saas have access to your data at disc or storage level.
100%. Companies are paperclip optimizers, with money as the objective. For example, Uber used ride data to circumvent investigations by regulators. There is absolutely no reason to assume that AI companies would not use their data in any way possible to reach their objectives.
and all their keys, because sooner or later, the harness is gonna read them
Claude code is actually very good at not reading your keys these days.
Not the case for me. I tried .envs, ansible-vault and sops, and it always ends up reading the unencrypted ones for some reason, usually in debugging sessions, it finds a way to read them.
Well it reads them, but (at least for me) it reads them in a way where it filters out the actual key values.
One company's irrational fear is a competitive advantage for someone else.
You mean these tools you can now rebuild at the cost of a night and one Claude code subscription?

You have to have an ordinarily unique startup if your software can’t be recreated quickly.

Yes, it certainly is an odd situation when some people believe you cannot use Mythos-class models because security while others believe you must do code reviews with Mythos-class models because security.
Not just “a startup”! Also, famously, Meta, with their famous AI usage dashboards
they would kill their own product if they did this

it would be like if tsmc started designing their own chips to compete with the people they sell their services to, they have more to gain by limiting their participation to a specific corner