There are various ways to run it with lower vram if you're ok with way worse latency & throughput
Edit: sorry this is for v3, the distilled models can be ran on consumer-grade GPUs