Steven's Knowledge

Getting Started

Install Velero on kind, backup and restore a workload, PostgreSQL PITR, restic offsite backup, failover exercise

Getting Started

This page walks the most common DR loops: backing up and restoring a Kubernetes workload with Velero, point-in-time-recovery on Postgres, and an offsite encrypted backup with restic.

Prerequisites

brew install velero restic postgresql
kind create cluster --name dr

Velero: K8s Backup to MinIO

Need an object store. For local: spin up MinIO.

# MinIO as a "S3" target
helm install minio --set rootUser=minio,rootPassword=minio123 \
  --set defaultBuckets=velero --set persistence.enabled=true \
  oci://registry-1.docker.io/bitnamicharts/minio

Install Velero with the AWS plugin (works with any S3-compatible storage):

cat <<EOF > credentials-velero
[default]
aws_access_key_id = minio
aws_secret_access_key = minio123
EOF

velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.10.0 \
  --bucket velero \
  --secret-file ./credentials-velero \
  --use-volume-snapshots=false \
  --backup-location-config region=minio,s3ForcePathStyle=true,s3Url=http://minio.default:9000

kubectl get pods -n velero

Deploy a Workload

kubectl create namespace demo
kubectl apply -n demo -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata: { name: hello }
spec:
  replicas: 2
  selector: { matchLabels: { app: hello } }
  template:
    metadata: { labels: { app: hello } }
    spec:
      containers:
        - name: hello
          image: nginxdemos/hello:plain-text
          ports: [{ containerPort: 80 }]
---
apiVersion: v1
kind: ConfigMap
metadata: { name: app-config }
data:
  greeting: "hello from the original deployment"
EOF

Take a Backup

velero backup create demo-backup-$(date +%s) \
  --include-namespaces demo

# Watch status
velero backup describe --details $(velero backup get -o name | head -1)

The backup captures: Deployment, ConfigMap, Service, PVCs (with --use-volume-snapshots enabled, also volume contents).

Simulate a Disaster

kubectl delete namespace demo

# Confirm it's gone
kubectl get all -n demo
# No resources found

Restore

velero restore create --from-backup $(velero backup get -o name | head -1 | cut -d/ -f2)

# Watch
velero restore describe $(velero restore get -o name | head -1 | cut -d/ -f2)

# Verify
kubectl get all -n demo
kubectl get configmap -n demo app-config -o yaml | grep greeting

Resources reappear. Backup → delete → restore loop works. You've done your first DR test.

Schedule Recurring Backups

velero schedule create daily-demo \
  --schedule="0 2 * * *" \
  --include-namespaces demo \
  --ttl 720h0m0s    # 30 days retention

A daily backup at 02:00 UTC, retained 30 days. Combine with quarterly long-retention backups (TTL=8760h) for compliance.

PostgreSQL Point-in-Time Recovery

Stop using pg_dump as your only backup. PITR gives you arbitrary recovery moments.

Setup base backup + WAL archiving

# Run a PG container locally
docker run -d --name pg \
  -e POSTGRES_PASSWORD=demo \
  -v $(pwd)/pgdata:/var/lib/postgresql/data \
  -v $(pwd)/pgarchive:/archive \
  postgres:16

# Configure WAL archiving
docker exec pg bash -c "
echo \"archive_mode = on\" >> /var/lib/postgresql/data/postgresql.conf
echo \"archive_command = 'cp %p /archive/%f'\" >> /var/lib/postgresql/data/postgresql.conf
echo \"wal_level = replica\" >> /var/lib/postgresql/data/postgresql.conf
"
docker restart pg

Take a base backup:

docker exec pg pg_basebackup -D /var/lib/postgresql/backups/base-$(date +%Y%m%d) -Ft -z -P

Insert some data over time:

docker exec -i pg psql -U postgres <<EOF
CREATE TABLE events (id serial PRIMARY KEY, at timestamp DEFAULT now(), msg text);
INSERT INTO events (msg) VALUES ('1pm: hello'), ('2pm: world');
EOF

Disaster: drop the table

docker exec -i pg psql -U postgres -c "DROP TABLE events;"
docker exec -i pg psql -U postgres -c "SELECT * FROM events;"  # Errors

Restore to point-in-time

Stop Postgres, restore base, configure recovery target:

docker stop pg
rm -rf pgdata && mkdir pgdata
tar xzf $(ls -t pgdata-backups/base-*/base.tar.gz | head -1) -C pgdata

# Create recovery.signal + recovery target
echo "" > pgdata/recovery.signal
cat >> pgdata/postgresql.conf <<EOF
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2026-05-21 14:30:00'
recovery_target_action = 'promote'
EOF

docker start pg

# Verify - the table is back, with rows up to your recovery point
docker exec -i pg psql -U postgres -c "SELECT * FROM events;"

In managed RDS / Cloud SQL this is a single API call: "restore to point in time X." But the underlying mechanism is the same.

Offsite Encrypted Backup with restic

For files (configs, code, certificates), use restic. Encrypted, deduplicated, supports many backends.

# Initialize a restic repo on S3-compatible storage
export RESTIC_REPOSITORY="s3:http://localhost:9000/restic"
export RESTIC_PASSWORD="mystrongpassword"
export AWS_ACCESS_KEY_ID=minio
export AWS_SECRET_ACCESS_KEY=minio123

restic init

# First backup
restic backup ~/important-configs

# Subsequent backups - only changes are uploaded
restic backup ~/important-configs

# List snapshots
restic snapshots

# Restore
restic restore latest --target /tmp/restore-test
diff -r ~/important-configs /tmp/restore-test/$(basename ~/important-configs)

restic is encrypted at the client; the server (S3) never sees plaintext. The encryption password is the only thing between an attacker and your data — secure it accordingly.

A Mini Failover Exercise

Pretend your primary region is gone. Walk through the steps without panic:

  1. Detect: alert fires; on-call confirms region is unreachable.
  2. Declare: "We are in a regional outage. Failing over to us-west-2."
  3. DNS update: Route53 health check fails → traffic shifts to standby region.
  4. Database promotion: Demote primary (already unreachable); promote read replica in us-west-2.
  5. Verify: synthetic transactions succeed; metrics flow.
  6. Communicate: status page updated; customer messaging sent.

Time this end-to-end. Your actual RTO vs target RTO is the gap to close.

Cleanup

kind delete cluster --name dr
docker rm -f pg
helm uninstall minio
rm -rf pgdata pgarchive

What's Next

  • Patterns — multi-region, snapshot lifecycle, immutable backups, runbooks, GameDay
  • Best Practices — testing cadence, RTO measurement, encryption, compliance, pitfalls

On this page