Hacker News new | past | comments | ask | show | jobs | submit
Agree directionally but you don't need $50k. $5k is plenty, $2-3k arguably the sweet spot.
as a local LLM novice, do you have any recommended reading to bootstrap me on selecting hardware? It has been quite confusing bring a latecomer to this game. Googling yields me a lot of outdated info.
First answer: If you haven't, give it a shot on whatever you already have. MoE models like Qwen3 and GPT-OSS are good on low-end hardware. My RTX 4060 can run qwen3:30b at a comfortable reading pace even though 2/3 of it spills over into system RAM. Even on an 8-year-old tiny PC with 32gb it's still usable.

Second answer: ask an AI, but prices have risen dramatically since their training cutoff, so be sure to get them to check current prices.

Third answer: I'm not an expert by a long shot, but I like building my own PCs. If I were to upgrade, I would buy one of these:

Framework desktop with 128gb for $3k or mainboard-only for $2700 (could just swap it into my gaming PC.) Or any other Strix Halo (ryzen AI 385 and above) mini PC with 64/96/128gb; more is better of course. Most integrated GPUs are constrained by memory bandwidth. Strix Halo has a wider memory bus and so it's a good way to get lots of high-bandwidth shared system/video RAM for relatively cheap. 380=40%; 385=80%; 395=100% GPU power.

I was also considering doing a much hackier build with 2x Tesla P100s (16gb HBM2 each for about $90 each) in a precision 5820 (cheap with lots of space and power for GPUs.) Total about $500 for 32gb HBM2+32gb system RAM but it's all 10-year-old used parts, need to DIY fan setup for the GPUs, and software support is very spotty. Definitely a tinker project; here there be dragons.

Agree on the framework, last week you could get a strix halo for $2700 shipped now it's over $3500, find a deal on a NVME and the framework with the noctua is probably going to be the quietest, some of them are pretty loud and hot.

I run qwen 122b with Claude code and nanoclaw, it's pretty decent but this stuff is nowhere prime time ready, but super fun to tinker with. I have to keep updating drivers and see speed increases and stability being worked on. I can even run much larger models with llama.cpp (--fit on) like qwen 397b and I suppose any larger model like GLM, it's slow but smart.

The 4-bit quants are 350GB, what hardware are you talking about?
qwen3:0.6b is 523mb, what model are you talking about? You seem to have a specific one in mind but the parent comment doesn't mention any.

For a hobby/enthusiast product, and even for some useful local tasks, MoE models run fine on gaming PCs or even older midrange PCs. For dedicated AI hardware I was thinking of Strix Halo - with 128gb is currently $2-3k. None of this will replace a Claude subscription.