Hacker News new | past | comments | ask | show | jobs | submit
What's the structurally simplest architecture that has worked to a reasonably competitive degree?
Competitiveness doesn't really come from architecture, but from scale, data, and fine-tuning data. There has been little innovation in architecture over the last few years, and most innovations are for the purpose of making it more efficient to run training or inference (fit in more data), not "fundamentally smarter"
If your definition of "competitive" is loose enough, you can write your own Markov chain in an evening. Transformer models rely on a lot of prior art that has to be learned incrementally.
Not that loose lol.

I’m thinking it’s still llama / dense decoder only transformer.