Hacker News new | past | comments | ask | show | jobs | submit
There's the other (orthogonal) possible explanation of using more GPUs for stress-testing before product launch.
That's less an orthogonal explanation and more an example of why they'd do something like serve a quantized model.