Hacker News new | past | comments | ask | show | jobs | submit
That's why he is saying it's not equivalent. For it to be the same, the LLM would have to train on/transform Minecraft's source code into its weights, then you prompt the LLM to make a game using the specifications of Minecraft solely through prompts. Of course it's copyright infringement if you just give a tool Minecraft's source code and tell it to copy it, just like it would be copyright infringement if you used a copier to copy Minecraft's source code into a new document and say you recreated Minecraft.
What if Copilot was already trained with Minecraft code in the dataset? Should be possible to test by telling the model to continue a snippet from the leaked code, the same way a news website proved their articles were used for training.
I feel as though the fact that you are asking a valid question shows how transformative it is; clearly, while the LLM gets a general ability to code from its training corpus, the data gets so transformed that it's difficult to tell what exactly it was trained on except a large body of code.
loading story #47318207
loading story #47317473
Is there a legal distinction between training, post-training, fine tuning and filling up a context window?

In all of these cases an AI model is taking a copyrighted source, reading it, jumbling the bytes and storing it in its memory as vectors.

Later a query reads these vectors and outputs them in a form which may or may not be similar to the original.

loading story #47318497
loading story #47316726
It's not equivalent, but it's close enough that you can't easily dismiss it.