Scaling & Rollouts
Horizontal Pod Autoscaler, rolling updates, rollbacks, and debugging - keeping your services up under change
Scaling & Rollouts
This page covers the day-to-day verbs of operating workloads: scaling them, deploying new versions safely, and debugging when something's wrong.
Manual Scaling
kubectl scale deployment/api-server --replicas=5
# Or via edit
kubectl edit deployment/api-serverManual scaling is fine for known load patterns. For variable traffic, automate it.
Horizontal Pod Autoscaler (HPA)
The HPA adjusts replica count based on observed metrics — usually CPU or memory.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale up when avg CPU > 70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # consider load over the last 60s
policies:
- { type: Pods, value: 4, periodSeconds: 60 } # up to +4 pods/min
scaleDown:
stabilizationWindowSeconds: 300 # wait 5 min before scaling down
policies:
- { type: Percent, value: 10, periodSeconds: 60 } # max -10% per minuteRequirements:
- The Metrics Server must be installed (
kubectl top podsproves it works). - Each container has resource
requestsset — utilization is "actual / requested."
Custom Metrics
For business metrics (requests/sec, queue depth), install Prometheus Adapter so HPA can read PromQL queries:
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"Vertical Pod Autoscaler (VPA)
VPA tunes a pod's CPU/memory requests, not the replica count. Useful for workloads you can't easily horizontally scale. Don't use VPA and HPA on the same metric — they'll fight.
Rolling Updates
Deployments roll updates by default. Change the image, apply, watch:
kubectl set image deployment/api-server api=myregistry/api-server:v1.2.4
# Or just edit the YAML and re-apply
kubectl rollout status deployment/api-server
# Waits and reports until the new ReplicaSet is fully healthyThe strategy controls how the swap happens:
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # at most 1 extra pod during update
maxUnavailable: 0 # never go below current replicas| Setting | Effect |
|---|---|
maxSurge: 0, maxUnavailable: 1 | Replace in-place, one at a time (briefly degraded) |
maxSurge: 1, maxUnavailable: 0 | Always have full capacity (safer; needs spare room in the cluster) |
maxSurge: 25%, maxUnavailable: 25% | Default; quick but partial unavailability |
Rollback
If the new version is bad, roll back:
kubectl rollout history deployment/api-server # see revisions
kubectl rollout undo deployment/api-server # to the previous one
kubectl rollout undo deployment/api-server --to-revision=3Deployment keeps a revisionHistoryLimit of old ReplicaSets (default 10) for this.
Pause / Resume
For multi-step changes (image + env + resources), pause the rollout, make all changes, then resume:
kubectl rollout pause deployment/api-server
kubectl set image deployment/api-server api=...
kubectl set env deployment/api-server FEATURE_FLAG=true
kubectl rollout resume deployment/api-serverBeyond Rolling: Blue/Green and Canary
Built-in rolling updates work for most apps. For traffic-shifting deploys, use a controller:
| Tool | Pattern |
|---|---|
| Argo Rollouts | Blue/green, canary, weighted traffic split |
| Flagger | Canary + automated analysis from Prometheus metrics |
| Service Mesh (Istio, Linkerd) | Fine-grained traffic splitting at the L7 layer |
PodDisruptionBudget
Even with rolling updates, voluntary disruptions (node drains, autoscaler activity) can take pods down. A PDB tells K8s "never go below N pods for this workload":
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server
spec:
minAvailable: 2 # or maxUnavailable
selector:
matchLabels: { app: api-server }Debugging
When something is wrong, work outward from the symptom:
A pod is not running
kubectl get pods # see status
kubectl describe pod hello-xxx # see eventsCommon statuses:
| Status | Likely cause |
|---|---|
Pending | No node has room (CPU/memory requests too high) or PVC isn't bound |
ImagePullBackOff | Wrong image name/tag, registry auth missing |
CrashLoopBackOff | App crashes on startup — check logs |
OOMKilled | Exceeded memory limit |
Error | Container exited non-zero — check logs |
See what's happening
kubectl describe pod hello-xxx # events at the bottom are gold
kubectl logs hello-xxx
kubectl logs hello-xxx --previous # crashed container's last logs
kubectl logs hello-xxx -c sidecar # specific container in the pod
kubectl logs -l app=hello --tail=100 # all pods matching a label
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl get events --field-selector involvedObject.name=hello-xxxGet a shell inside
kubectl exec -it hello-xxx -- sh
kubectl exec -it hello-xxx -c sidecar -- sh
# If the image has no shell, ephemeral debug containers:
kubectl debug -it hello-xxx --image=busybox --target=helloCheck resource pressure
kubectl top nodes # CPU/mem per node
kubectl top pods --all-namespaces # CPU/mem per pod
kubectl describe node worker-1 # capacity, allocations, pressure conditionsConnectivity issues
# Open a one-off pod for testing
kubectl run debug --rm -it --image=nicolaka/netshoot -- sh
# Inside:
nslookup api-server # DNS
curl http://api-server # service connectivity
nc -zv postgres 5432 # raw TCPA Production Rollout, End-to-End
- Build & push image with an immutable tag (
v1.2.4, neverlatest). - Update the manifest — bump the image tag.
kubectl apply(or let GitOps do it).kubectl rollout status deployment/api-server --timeout=10min CI; fail the pipeline if it doesn't go green.- Watch dashboards / alerts for the next ~15 minutes.
- If bad:
kubectl rollout undo. Fix forward in git; never edit the cluster out-of-band.
What's Next
You can deploy, scale, and debug. The last piece is doing it safely and repeatably — security, RBAC, GitOps, and the patterns that keep production calm → Best Practices.