Use timeouts + circuit breakers, and keep retries bounded with jitter/backoff. Also consider bulkheads (limit concurrency per dependency) to prevent one failure from exhausting all threads/connections.