Hacker News new | past | comments | ask | show | jobs | submit
that is absolutely not the case for any system of size and scale. that would just burn out the on-call team and not result in improvements. Error rates/budgets are used instead.
It depends what you're monitoring. If it's response codes from user generated queries, then I'd agree with you.

But if it is synthetic queries sent from the monitoring platform, then you control the user agent, payload, and endpoints. So any failed requests are a symptom of a misconfiguration and/or failure that should be investigated. Albeit not necessarily as a P1 priority.