These GPUs are still big SIMD devices at their core though, no?
Yes, but no. No, in that these days, GPUs are entirely scalar from the point of view of invocations. Using vectors in shaders is pointless - it will be as fast as scalar variables (double instruction dispatch on AMD GPUs is an exception).
But yes from the point of view that a collection of invocations all progressing in lockstep get arithmetic done by vector units. GPUs have just gotten really good at hiding what happens with branching paths between invocations.
SIMT is distinct model. Ergonomics are wildly different. Instead of contracting a long iteration by packing its steps together to make them "wider", you rotate the iteration across cores.
The critical difference is that SIMD and parallel programming are totally different in terms of ergonomics while SIMT is almost exactly the same as parallel programming. You have to design for SIMD and parallelism separately while SIMT and parallelism are essentially the same skill set.
The fan-in / fan-out and iteration rotation are the key skills for SIMT.