Hacker News new | past | comments | ask | show | jobs | submit
Most of the arch work is just scaling knobs.

If you swap in wierd layer types or move the objective much people run into ugly failure modes fast, so the field keeps circling the same Transformer blocks and then markets the change as novel when it's mostly a trianing and compute tradeoff.