You want low deterministic latency with sharp tails.
If all you care about is throughput then deep pipelines + lots of threads will get you there at the cost of latency.