At minimum: RED metrics (rate, errors, duration), saturation (CPU/memory), dependency health, and SLO burn-rate. Include breakdowns by route, region, and version.
Dashboards should answer "are we healthy" and "where is it broken":
A good top row:
RPS | Error % | p95/p99 latency | SLO burn-rate