Architecturemedium

Why do teams watch p95/p99 latency, not just average latency?

Answer

Averages hide tail latency: a few very slow requests can be invisible in the mean but painful for users. p95/p99 show how the slowest 5%/1% behave and help catch queueing and saturation issues.

Advanced answer

Deep dive

Expanding on the short answer — what usually matters in practice:

Context (tags): latency, p99, performance, observability
Scaling: what scales horizontally vs vertically, where bottlenecks appear.
Reliability: retries/circuit breakers/idempotency, observability (logs/metrics/traces).
Evolution: keep changes cheap (boundaries, contracts, tests).
Explain the "why", not just the "what" (intuition + consequences).
Trade-offs: what you gain/lose (time, memory, complexity, risk).
Edge cases: empty inputs, large inputs, invalid inputs, concurrency.

Examples

A tiny example (an explanation template):

// Example: discuss trade-offs for "why-do-teams-watch-p95/p99-latency,-not-just-ave"
function explain() {
  // Start from the core idea:
  // Averages hide tail latency: a few very slow requests can be invisible in the mean but pain
}

Common pitfalls

Too generic: no concrete trade-offs or examples.
Mixing average-case and worst-case (e.g., complexity).
Ignoring constraints: memory, concurrency, network/disk costs.

Interview follow-ups

When would you choose an alternative and why?
What production issues show up and how do you diagnose them?

Why do teams watch p95/p99 latency, not just average latency?

Answer

Advanced answer

Deep dive

Examples

Common pitfalls

Interview follow-ups

Related questions

Why do teams watch p95/p99 latency, not just average latency?

Answer

Advanced answer

Deep dive

Examples

Common pitfalls

Interview follow-ups

Related questions