Hacker News new | past | comments | ask | show | jobs | submit
Yea, at the end of the day a big part of this question comes down to whether that copying is fair use and that is an open question with the transformative nature being the primary point in favor of the LLM. But it is copying from some works to another - if it doesn't have some fair use exception it is absolutely violating the licensing of most of the training data. It's a bit different from previous settled case law because it's copying so little from so many billions of different things. I think blocking reproduction is wise by LLM companies for PR purposes but it doesn't guarantee that training is a license exempted activity.
Yup. Of course it's copying. But all expectations are that courts will rule that fair use allows such copying, because of the nature of the transformation.