Hacker News new | past | comments | ask | show | jobs | submit
I'm not sure the best way to describe what it is that LLMs have had to learn to do what they do - minimize next word errors. "World model" seems misleading since they don't have any experience with the real world, and even in their own "world of words" they are just trained as passive observers, so it's not even a world-of-words model where they have learnt how this world responds to their own output/actions.

One description sometimes suggested is that they have learnt to model the (collective average) generative processes behind their training data, but of course they are doing this without knowing what the input was to that generative process - WHY the training source said what it did - which would seem to put a severe constraint on their ability to learn what it means. It's really more like they are modelling the generative process under false assumption that it is auto-regressive, rather than reacting to a hidden outside world.

The tricky point is that LLMs have clearly had to learn something at least similar to semantics to do a good job of minimizing prediction errors, although this is limited both by what they architecturally are able to learn, and what they need to learn for this task (literally no reward for learning more beyond what's needed for predict next word).

Perhaps it's most accurate to say that rather than learning semantics they've learned deep predictive contexts (patterns). Maybe if they were active agents, continuously learning from their own actions then there wouldn't be much daylight between "predictive contexts" and "semantics", although I think semantics implies a certain level of successful generalization (& exception recognition) to utilize experience in novel contexts. Looking at the failure modes of LLMs, such as on the farmer crossing river in boat puzzles, it seems clear they are more on the (exact training data) predictive context end of the spectrum, rather than really having grokked the semantics.