Hacker News new | past | comments | ask | show | jobs | submit
If your definition of "competitive" is loose enough, you can write your own Markov chain in an evening. Transformer models rely on a lot of prior art that has to be learned incrementally.
Not that loose lol.

I’m thinking it’s still llama / dense decoder only transformer.