Autoscaling adjusts the number of instances based on demand (metrics like CPU, latency, queue depth). A common pitfall is thrashing: scaling up/down too aggressively due to noisy metrics—use cooldowns and proper thresholds.