Hacker News new | past | comments | ask | show | jobs | submit
It's the third sentence of the article:

> the district court ruled that using the books to train LLMs was fair use but left for trial the question of whether downloading them for this purpose was legal.

No, those are separate issues.

The pipeline is something like: download material -> store material -> train models on material -> store models trained on material -> serve output generated from models.

These questions focus on the inputs to the model training, the question I have raised focuses on the outputs of the model. If [certain] outputs are considered derivative works of input material, then we have a cascade of questions which parts of the pipeline are covered by the license requirements. Even if any of the upstream parts of this simplified pipeline are considered legal, it does not imply that that the rest of the pipeline is compliant.

Consider the net effect and the answer is clear. When these models are properly "trained", are people going to look for the book or a derivative of it, with proper attribution?

Or is the LLM going to regurgitate the same content with zero attribution, and shift all the traffic away from the original work?

When viewed in this frame, it is obvious that the work is derivative and then some.

That is your opinion, but the judge disagreed with you. The decision may have been overturned on appeal, but as it stands, in that courtroom, the training was fair use.
I can memorize a song and it will be fair use too, but it won't be anymore once I start performing it publicly. Training itself is quite obviously fair use, what matters is what happens next.
This is also, unfortunately, the only way this can be settled. Making LLM output legally a derivative work would murder the AI golden rush and nobody wants that