Hacker News new | past | comments | ask | show | jobs | submit
VRAM is not everything - GPU cores also matter (a lot) for inference
4x Radeon will have significantly more GPU power than say Mac Studio or DGX Spark.
inference speed is like monitor Hz; sure, you go from 60 to 120Hz and thats noticeable, but unless your model is AGI, at some point you're just generating more code than you'll ever realistically be able to control, audit and rely on.

So, context is probably more $/programming worth than inference speed.