Right-size instances based on metrics and use autoscaling instead of overprovisioning. Add caching (CDN/app cache) and consider reserved/committed capacity for predictable workloads.