Story Detail of id 47316071 | Liveview Hacker News

NewsaHackO6 hours ago | on: Is legal the same as legitimate: AI reimplementation and the erosion of copyleft

That's why he is saying it's not equivalent. For it to be the same, the LLM would have to train on/transform Minecraft's source code into its weights, then you prompt the LLM to make a game using the specifications of Minecraft solely through prompts. Of course it's copyright infringement if you just give a tool Minecraft's source code and tell it to copy it, just like it would be copyright infringement if you used a copier to copy Minecraft's source code into a new document and say you recreated Minecraft.

alpaca1285 hours ago | parent | next

What if Copilot was already trained with Minecraft code in the dataset? Should be possible to test by telling the model to continue a snippet from the leaked code, the same way a news website proved their articles were used for training.

NewsaHackO4 hours ago | root | parent

I feel as though the fact that you are asking a valid question shows how transformative it is; clearly, while the LLM gets a general ability to code from its training corpus, the data gets so transformed that it's difficult to tell what exactly it was trained on except a large body of code.

loading story #47318207

loading story #47317473

paxys5 hours ago | parent | next

Is there a legal distinction between training, post-training, fine tuning and filling up a context window?

In all of these cases an AI model is taking a copyrighted source, reading it, jumbling the bytes and storing it in its memory as vectors.

Later a query reads these vectors and outputs them in a form which may or may not be similar to the original.

loading story #47318497

loading story #47316726

phendrenad24 hours ago | parent

It's not equivalent, but it's close enough that you can't easily dismiss it.

#visit	13,035,360
#session	74,665
#live-session	0