SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency | Liveview Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency

https://infini-ai-lab.github.io/Sequoia-Page/

131zinccat | 1 week ago | 61 | HN

loading story #40262090

loading story #40262534

loading story #40262041

loading story #40262144

loading story #40262192

loading story #40263055

loading story #40262058

loading story #40262156