Darn I've only got ~20 GB of VRAM. I really need to get a stronger machine for this sort of stuff.
20GB isn't enough for a 13B parameter model? I thought the 29-31B models could run on a 24GB GTX x090 card?
I'm currently shopping for a local LLM setup and between something like the Framework Desktop with 64-128GB of shared RAM or just adding a 3090 or 4090 to my homelab so I'm very curious what hardware is working well for others.
loading story #47929273
How much system memory do you have? Llama.cpp can split layers across cpu and gpu. Speeds will be slower of course but it's not unusable at all.