Real-time LLM Inference on Standard GPUs: 3k tokens/s per request | Liveview Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

Real-time LLM Inference on Standard GPUs: 3k tokens/s per request

https://blog.kog.ai/real-time-llm-inference-on-standard-gpus-3-000-tokens-s-per-request/

104NicoConstant | 5 hours ago | 51 | HN

loading story #48321384

loading story #48321626

loading story #48323573

loading story #48323553

loading story #48323311

loading story #48321414

loading story #48321520

loading story #48322002

loading story #48321383

loading story #48322032

loading story #48323413

loading story #48323081

loading story #48321581

loading story #48322135

loading story #48321457

loading story #48322420

loading story #48321392

loading story #48322684

loading story #48321176

loading story #48321150

loading story #48321529

loading story #48322499