Steven's Knowledge

Scaling & Rollouts

Horizontal Pod Autoscaler, rolling updates, rollbacks, and debugging - keeping your services up under change

Scaling & Rollouts

This page covers the day-to-day verbs of operating workloads: scaling them, deploying new versions safely, and debugging when something's wrong.

Manual Scaling

kubectl scale deployment/api-server --replicas=5

# Or via edit
kubectl edit deployment/api-server

Manual scaling is fine for known load patterns. For variable traffic, automate it.

Horizontal Pod Autoscaler (HPA)

The HPA adjusts replica count based on observed metrics — usually CPU or memory.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70           # scale up when avg CPU > 70%
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60       # consider load over the last 60s
      policies:
        - { type: Pods, value: 4, periodSeconds: 60 }   # up to +4 pods/min
    scaleDown:
      stabilizationWindowSeconds: 300      # wait 5 min before scaling down
      policies:
        - { type: Percent, value: 10, periodSeconds: 60 } # max -10% per minute

Requirements:

  • The Metrics Server must be installed (kubectl top pods proves it works).
  • Each container has resource requests set — utilization is "actual / requested."

Custom Metrics

For business metrics (requests/sec, queue depth), install Prometheus Adapter so HPA can read PromQL queries:

metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

Vertical Pod Autoscaler (VPA)

VPA tunes a pod's CPU/memory requests, not the replica count. Useful for workloads you can't easily horizontally scale. Don't use VPA and HPA on the same metric — they'll fight.

Rolling Updates

Deployments roll updates by default. Change the image, apply, watch:

kubectl set image deployment/api-server api=myregistry/api-server:v1.2.4
# Or just edit the YAML and re-apply

kubectl rollout status deployment/api-server
# Waits and reports until the new ReplicaSet is fully healthy

The strategy controls how the swap happens:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1                          # at most 1 extra pod during update
      maxUnavailable: 0                    # never go below current replicas
SettingEffect
maxSurge: 0, maxUnavailable: 1Replace in-place, one at a time (briefly degraded)
maxSurge: 1, maxUnavailable: 0Always have full capacity (safer; needs spare room in the cluster)
maxSurge: 25%, maxUnavailable: 25%Default; quick but partial unavailability

Rollback

If the new version is bad, roll back:

kubectl rollout history deployment/api-server          # see revisions
kubectl rollout undo deployment/api-server             # to the previous one
kubectl rollout undo deployment/api-server --to-revision=3

Deployment keeps a revisionHistoryLimit of old ReplicaSets (default 10) for this.

Pause / Resume

For multi-step changes (image + env + resources), pause the rollout, make all changes, then resume:

kubectl rollout pause deployment/api-server
kubectl set image deployment/api-server api=...
kubectl set env deployment/api-server FEATURE_FLAG=true
kubectl rollout resume deployment/api-server

Beyond Rolling: Blue/Green and Canary

Built-in rolling updates work for most apps. For traffic-shifting deploys, use a controller:

ToolPattern
Argo RolloutsBlue/green, canary, weighted traffic split
FlaggerCanary + automated analysis from Prometheus metrics
Service Mesh (Istio, Linkerd)Fine-grained traffic splitting at the L7 layer

PodDisruptionBudget

Even with rolling updates, voluntary disruptions (node drains, autoscaler activity) can take pods down. A PDB tells K8s "never go below N pods for this workload":

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server
spec:
  minAvailable: 2                          # or maxUnavailable
  selector:
    matchLabels: { app: api-server }

Debugging

When something is wrong, work outward from the symptom:

A pod is not running

kubectl get pods                           # see status
kubectl describe pod hello-xxx             # see events

Common statuses:

StatusLikely cause
PendingNo node has room (CPU/memory requests too high) or PVC isn't bound
ImagePullBackOffWrong image name/tag, registry auth missing
CrashLoopBackOffApp crashes on startup — check logs
OOMKilledExceeded memory limit
ErrorContainer exited non-zero — check logs

See what's happening

kubectl describe pod hello-xxx             # events at the bottom are gold
kubectl logs hello-xxx
kubectl logs hello-xxx --previous          # crashed container's last logs
kubectl logs hello-xxx -c sidecar          # specific container in the pod
kubectl logs -l app=hello --tail=100       # all pods matching a label

kubectl get events --sort-by=.metadata.creationTimestamp
kubectl get events --field-selector involvedObject.name=hello-xxx

Get a shell inside

kubectl exec -it hello-xxx -- sh
kubectl exec -it hello-xxx -c sidecar -- sh

# If the image has no shell, ephemeral debug containers:
kubectl debug -it hello-xxx --image=busybox --target=hello

Check resource pressure

kubectl top nodes                          # CPU/mem per node
kubectl top pods --all-namespaces          # CPU/mem per pod
kubectl describe node worker-1             # capacity, allocations, pressure conditions

Connectivity issues

# Open a one-off pod for testing
kubectl run debug --rm -it --image=nicolaka/netshoot -- sh

# Inside:
nslookup api-server                        # DNS
curl http://api-server                     # service connectivity
nc -zv postgres 5432                       # raw TCP

A Production Rollout, End-to-End

  1. Build & push image with an immutable tag (v1.2.4, never latest).
  2. Update the manifest — bump the image tag.
  3. kubectl apply (or let GitOps do it).
  4. kubectl rollout status deployment/api-server --timeout=10m in CI; fail the pipeline if it doesn't go green.
  5. Watch dashboards / alerts for the next ~15 minutes.
  6. If bad: kubectl rollout undo. Fix forward in git; never edit the cluster out-of-band.

What's Next

You can deploy, scale, and debug. The last piece is doing it safely and repeatably — security, RBAC, GitOps, and the patterns that keep production calm → Best Practices.

On this page