Hacker News new | past | comments | ask | show | jobs | submit
This is certainly a great immediately useful tool but also a relatively small ROI, both the return and the investment. Big tech is aiming for a much bigger return on a clearly bigger investment. That’s going to potentially look like a lot of useless stuff in the meantime. Also, if it wasn’t for big tech and big investments, there wouldn’t even be these tools / models at this level of sophistication for others to be using for applications like this one.
While the press lumps it all together as "AI", you have to differentiate LLMs (driven by big tech and big money) from unrelated image/video types of generative models and approaches like diffusion, NeRF, Gaussian splatting, etc, which have their roots in academia.
LLMs don't have their roots in academia?
Not anymore.
Not at all - Transformer was invented by a bunch of former Google employees (while at Google), primarily Jakob Uszkoreit and Noam Shazeer. Of course as with anything it builds on what had gone before, but it's really quite a novel architecture.
The scientific impact of the transformer paper is large, but in my opinion the novelty is vastly overstated. The primary novelty is adapting the (already existing) dot-product attention mechanism to be multi-headed. And frankly, the single-head -> multi-head evolution wasn't particularly novel -- it's the same trick the computer vision community applied to convolutions 5 years earlier, yielding the widely-adopted grouped convolution. The lasting contribution of the Transformer paper is really just ordering the existing architectural primitives (attention layers, feedforward layers, normalization, residuals) in a nice, reusable block. In my opinion, the most impactful contributions in the lineage of modern attention-based LLMs are the introduction of dot-product attention (Bahdanau et al, 2015) and the first attention-based sequence-to-sequence model (Graves, 2013). Both of these are from academic labs.

As a side note, a similar phenomenon occurred with the Adam optimizer, where the ratio of public/scientific attribution to novelty is disproportionately large (the Adam optimizer is very minor modification of the RMSProp + momentum optimization algorithm presented in the same Graves, 2013 paper mentioned above)

loading story #41872923
This makes no sense. A thing's roots don't change, either it did start there or it didn't.
It didn't.

At least, the Transformer didn't. The abstract idea of a language model goes way back though within the field of linguistics, and people were building simplistic "N-gram" models before ever using neural nets, then using other types of neural net such as LSTMs and CNNs(!) before Google invented the Transformer (primarily with the goal of fully utilizing the parallelism available from GPUs - which couldn't be done with a recurrent model like LSTM).

On the plus side, for Adobe, is that they have a fairly stable & predictable SaaS revenue stream so as long as their R&D and product hosting costs don't exceed their subscription base, they're ok. This is wildly different from -- for example -- the hyperscalers, who have to build and invest far in advance of a market [for new services especially].