Hacker News new | past | comments | ask | show | jobs | submit

SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency

https://infini-ai-lab.github.io/Sequoia-Page/
loading story #40262090
loading story #40262534
loading story #40262041
loading story #40262144
loading story #40262192
loading story #40263055
loading story #40262058
loading story #40262156