Story Detail of id 48365308 | Liveview Hacker News

avianlyric18 hours ago | on: OpenAI frontier models and Codex are now available on AWS

Pace of data creation ignores the fact that the majority of the big gains in LLM “intelligence” has come from scraping and feeding in the huge amount of public data that already exists.

Unless we’re producing data on the order of an entire new internet every couple of years, then it’s hard to see how LLMs can achieve further huge leaps in capability compared to training on effectively 0% of the internet vs 100% of the internet.

kopirgan15 hours ago | parent | next

That is without going into fact that many already use AI to type out and write stuff. I have a customer in Far East that routinely uses it even for simple emails, he is not so familiar with English.

fragmede15 hours ago | parent

The majority of the gains come from the size of the supercomputers used to train them on. That's still growing. The algorithms used, and how the data is annotated is also some secret sauce.

#visit	13,530,342
#session	74,665
#live-session	0