Hacker News new | past | comments | ask | show | jobs | submit

Real-time LLM Inference on Standard GPUs: 3k tokens/s per request

https://blog.kog.ai/real-time-llm-inference-on-standard-gpus-3-000-tokens-s-per-request/
loading story #48321384
loading story #48321626
loading story #48323573
loading story #48323553
loading story #48323311
loading story #48321414
loading story #48321520
loading story #48322002
loading story #48321383
loading story #48322032
loading story #48323413
loading story #48323081
loading story #48321581
loading story #48322135
loading story #48321457
loading story #48322420
loading story #48321392
loading story #48322684
loading story #48321176
loading story #48321150
loading story #48321529
loading story #48322499