On one hand, I understand what he's saying, and that's why I have been frustrated in the past when I've heard people say "it's just fancy autocomplete" without emphasizing the awesome capabilities that can give you. While I haven't seen this video by Sutskever before, I have seen a very similar argument by Hinton: in order to get really good at next token prediction, the model needs to "discover" the underlying rules that make that prediction possible.
All that said, I find his argument wholly unconvincing (and again, I may be waaaaay stupider than Sutskever, but there are other people much smarter than I who agree). And the reason for this is because every now and then I'll see a particular type of hallucination where it's pretty obvious that the LLM is confusing similar token strings even when their underlying meaning is very different. That is, the underlying "pattern matching" of LLMs becomes apparent in these situations.
As I said originally, I'm really glad VCs are pouring money into this, but I'd easily make a bet that in 5 years that LLMs will be nowhere near human-level intelligence on some tasks, especially where novel discovery is required.