One only needs to look at GPU driven rendering and ray tracing in shaders to deduce that shader cores and memory subsystems these days have become flexible enough to do work besides lock-step uniform parallelism where the only difference was the thread ID.
Nobody strives for random access memory read patterns, but the universal popularity of buffer device address and descriptor arrays can be taken somewhat as proof that these indirections are no longer the friction for GPU architectures that they were ten years ago.
At the same time, the languages are no longer as restrictive as they once were. People are recording commands on the GPU. This kind of fiddly serial work is an indication that the ergonomics of CPU programming have less of a relative advantage, and that cuts deeply into the tradeoff costs.
GPUs these days have massive cache often hundreds of megabytes large, on top of an already absurd amount of registers. A random read will often load a full cacheline into a register and keep it there, reusing it as needed between invocations.