To clarify this, I think it's reasonable that token prediction as a training objective could lead to AGI given the underlying model has the correct architecture. The question really is if the underlying architecture is good enough to capitalize on the training objective so as to result in superhuman intelligence.
For example, you'll have little luck achieving AGI with decision trees no matter what's their training objective.
My objection is more about the data used for training, assuming we are talking about unsupervised learning. Text alone just won't cut it.