Observabilityeasy

What dashboards are must-have for a critical API?

Answer

At minimum: RED metrics (rate, errors, duration), saturation (CPU/memory), dependency health, and SLO burn-rate. Include breakdowns by route, region, and version.

Advanced answer

Deep dive

Dashboards should answer "are we healthy" and "where is it broken":

RED: request rate, error rate, duration (p50/p95/p99).
Saturation: CPU, memory, thread pools, queue depth.
Dependencies: DB latency, cache hit ratio, downstream status.
SLO: burn-rate over 1h/6h windows.

Examples

A good top row:

RPS | Error % | p95/p99 latency | SLO burn-rate

Common pitfalls

Too many charts without a narrative.
Missing breakdowns by version or region.
No links to traces/logs from the dashboard.

Interview follow-ups

How do you choose time windows for p95/p99

What dashboards are must-have for a critical API?

Answer

Advanced answer

Deep dive

Examples

Common pitfalls

Interview follow-ups

Related questions

What dashboards are must-have for a critical API?

Answer

Advanced answer

Deep dive

Examples

Common pitfalls

Interview follow-ups

Related questions