Story Detail of id 47473207 | Liveview Hacker News

alexfromapex20 hours ago | on: Tinybox – A powerful computer for deep learning

$12,000 for the base model is insane. I have an Apple M3 Max with 128GB RAM that can run 120B parameter models using like 80 watts of electricity at about 15-20 tokens/sec. It's not amazing for 120B parameter models but it's also not 12 grand.

Thaxll20 hours ago | parent | next

M3 max tflops is tiny compared to the 12k box. It's not even comparable.

davej15 hours ago | root | parent | next

It is very comparable if you work out the $/tok/s on inference. I did some napkin math and it looks like you’re getting roughly 3x the performance for 3x the cost. Red v2 vs Mac Studio M3 Ultra 96GB.

If you compare tokens/kWh efficiency then my math has Mac Studio being about 1.5x more efficient.

zozbot23420 hours ago | root | parent

M3 has tolerable decode performance for the price, and that's what people would care about most of the time. they underperform severely wrt. prefill, but that's a fraction of the workload. AI, even agentic AI, spends most of its time outputing tokens, not processing context in bulk.

segmondy20 hours ago | parent

it's for fools. i bought 160gb of vram for $1000 last year. 96gb of p40 VRAM can be had for under $1000. And it will run gpt-oss-120b Q8 at probably 30tk/sec

timschmidt20 hours ago | root | parent

P40 is Tesla architecture which is no longer receiving driver or CUDA updates. And only available as used hardware. Fine for hobbyists, startups, and home labs, but there is likely a growing market of businesses too large to depend on used gear from ebay, but too small for a full rack solution from Nvidia. Seems like that's who they're targeting.

segmondy20 hours ago | root | parent

99% of interest is in inference. If you want to fine-tune a model, just rent the best gpu in the cloud. It's often cheaper and faster.

timschmidt19 hours ago | root | parent

Great option if you don't mind sharing your data with the cloud. Some businesses want to own the hardware their data resides on.

cootsnuck19 hours ago | root | parent | next

How many businesses have the capabilities and expertise to train their own models?

timschmidt19 hours ago | root | parent

No idea. Probably more every day.

segmondy19 hours ago | root | parent

renting GPU, how is that sharing data with the cloud? you can rent GPU from GCP or AWS

timschmidt18 hours ago | root | parent

I suppose if I rent a cloud GPU and just let it sit there dark and do nothing then I wouldn't have to move any data to it. Otherwise, I'm uploading some kind of work for it to do. And that usually involves some data to operate on. Even if it's just prompts.

loading story #47477488

#visit	13,228,918
#session	74,665
#live-session	0