Hacker News new | past | comments | ask | show | jobs | submit
Most people are using something in the llama family for inference. Llama server is my go to. Unsloth guides describe how to configure inference for your model of choice.