Nvidia greenboost: transparently extend GPU VRAM using system RAM/NVMe

https://gitlab.com/IsolatedOctopi/nvidia_greenboost

305mmastrac | 3 days ago | 62 | HN

loading story #47432620

Nobody mentioning how this project is vibecoded slop?

  > The code is really bad with completely uneeded parts. The LLM (Qwen 2.5 7B) has hardcoded the i9 14700KF topology, and has variables related to it never used... It's even funnier that the show hardware function always prints the same string. There are even random pip log files. Why did this slop got coverage here?

https://www.phoronix.com/forums/forum/linux-graphics-x-org-d...

nl9 hours ago | parent | next

This is really interesting engineering, but I agree with the other commentators that the benchmarking makes it hard to understand the contribution various factors are having.

The ExLlamaV3 EXL3 2bpw (8 GB, full VRAM) row is an order of magnitude faster than the baseline - but the baseline seems to be the 32GB model running with the KV cache shared to system memory only (I think?)

But if a 8GB model gives sufficient quality then it seems like that would have worked without the shared memory thing?

I think the useful apples-to-apples benchmark is currently the Ollama + GreenBoost shim (baseline) (2-5 tps) vs ExLlamaV3 + GreenBoost cache (8–20 tps) comparison.

It would be really useful to see this compared with the existing llama CPU/memory offload. There is a note at the start ("Offload layers to CPU — works, but drops token/s by 5–10× because CPU RAM has no CUDA coherence") - but it is unclear if that 5-10x token speed drop is compared to running a model completely in GPU or compared to the greenboost approach.

I think it is vs GPU, in which case it seems likely the performance is similar to what greenboost is giving but probably much more stable.

loading story #47434182

loading story #47432379

loading story #47436024

loading story #47432863

loading story #47432995

loading story #47432495

Insanity9 hours ago | parent | next

Extend your VRAM using RAM, then extend your RAM using Swap.

system26 hours ago | parent | next

And burn the swap pagesys file to a rewritable DVD to complete the cycle. It will be super fast that way.

loading story #47435346

loading story #47433825

loading story #47432452

loading story #47432046

loading story #47432312

loading story #47436511

loading story #47434143

loading story #47433708

loading story #47432696

loading story #47391321

loading story #47395028

loading story #47435164

loading story #47432707

loading story #47433028

loading story #47432071

#visit	13,167,726
#session	74,665
#live-session	0