No we don't have to, but so far we do, because that's the most legally consistent. If you want to change that, you're going to need to pass new laws that may wind up radically redefining intellectual property.
> Has the model really performed an extreme transformation if it is able to produce the training data near-verbatim?
Of course it has, if the transformation is extreme, as it appears to be here. If I memorize the lyrics to a bunch of love songs, and then write my own love song where every line is new, nobody's going to successfully sue me just because I can sing a bunch of other songs from memory.
Also, it's not even remotely clear that the LLM can produce the training data near-verbatim. Generally it can't, unless it's something that it's been trained on with high levels of repetition.
> you're going to need to pass new laws that may wind up radically redefining intellectual property
You're correct that this is one route to resolving the situation, but I think it's reasonable to lean more strongly into the original intent of intellectual property laws to defend creative works as a manner to sustain yourself that would draw a pretty clear distinction between human creativity and reuse and LLMs.
But you're missing the other half of copyright law, which is the original intent to promote the public good.
That's why fair use exists, for the public good. And that's why the main legal argument behind LLM training is fair use -- that the resulting product doesn't compete directly with the originals, and is in the public good.
In other words, if you write an autobiography, you're not losing significant sales because people are asking an LLM about your life.