My comment elsewhere goes into but more detail but basically silicon stopped being able to make single threaded code faster in about 2012 - we just have been getting “more parallel cores” since. And now at wafer scale we see 900,000 cores on a “chip”. When 100% parallel coding runs 1 million times faster than your competitors, when following one software engineering path leads to code that can run 1M X, then we will find ways to use that excess capacity - and the engineers who can do it get to win.
I’m not sure how LLMs face this problem.
As soon as the abstractions leak or you run into an underlying issue you suddenly need to understand everything about the underlying system or you're SOOL.
I'd rather have a simpler system I already understand all the proceeding abstractions about.
The overhead of this is minimal when you keep things simple and avoid shiny things.