Getting Started
Install Velero on kind, backup and restore a workload, PostgreSQL PITR, restic offsite backup, failover exercise
Getting Started
This page walks the most common DR loops: backing up and restoring a Kubernetes workload with Velero, point-in-time-recovery on Postgres, and an offsite encrypted backup with restic.
Prerequisites
brew install velero restic postgresql
kind create cluster --name drVelero: K8s Backup to MinIO
Need an object store. For local: spin up MinIO.
# MinIO as a "S3" target
helm install minio --set rootUser=minio,rootPassword=minio123 \
--set defaultBuckets=velero --set persistence.enabled=true \
oci://registry-1.docker.io/bitnamicharts/minioInstall Velero with the AWS plugin (works with any S3-compatible storage):
cat <<EOF > credentials-velero
[default]
aws_access_key_id = minio
aws_secret_access_key = minio123
EOF
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.10.0 \
--bucket velero \
--secret-file ./credentials-velero \
--use-volume-snapshots=false \
--backup-location-config region=minio,s3ForcePathStyle=true,s3Url=http://minio.default:9000
kubectl get pods -n veleroDeploy a Workload
kubectl create namespace demo
kubectl apply -n demo -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata: { name: hello }
spec:
replicas: 2
selector: { matchLabels: { app: hello } }
template:
metadata: { labels: { app: hello } }
spec:
containers:
- name: hello
image: nginxdemos/hello:plain-text
ports: [{ containerPort: 80 }]
---
apiVersion: v1
kind: ConfigMap
metadata: { name: app-config }
data:
greeting: "hello from the original deployment"
EOFTake a Backup
velero backup create demo-backup-$(date +%s) \
--include-namespaces demo
# Watch status
velero backup describe --details $(velero backup get -o name | head -1)The backup captures: Deployment, ConfigMap, Service, PVCs (with --use-volume-snapshots enabled, also volume contents).
Simulate a Disaster
kubectl delete namespace demo
# Confirm it's gone
kubectl get all -n demo
# No resources foundRestore
velero restore create --from-backup $(velero backup get -o name | head -1 | cut -d/ -f2)
# Watch
velero restore describe $(velero restore get -o name | head -1 | cut -d/ -f2)
# Verify
kubectl get all -n demo
kubectl get configmap -n demo app-config -o yaml | grep greetingResources reappear. Backup → delete → restore loop works. You've done your first DR test.
Schedule Recurring Backups
velero schedule create daily-demo \
--schedule="0 2 * * *" \
--include-namespaces demo \
--ttl 720h0m0s # 30 days retentionA daily backup at 02:00 UTC, retained 30 days. Combine with quarterly long-retention backups (TTL=8760h) for compliance.
PostgreSQL Point-in-Time Recovery
Stop using pg_dump as your only backup. PITR gives you arbitrary recovery moments.
Setup base backup + WAL archiving
# Run a PG container locally
docker run -d --name pg \
-e POSTGRES_PASSWORD=demo \
-v $(pwd)/pgdata:/var/lib/postgresql/data \
-v $(pwd)/pgarchive:/archive \
postgres:16
# Configure WAL archiving
docker exec pg bash -c "
echo \"archive_mode = on\" >> /var/lib/postgresql/data/postgresql.conf
echo \"archive_command = 'cp %p /archive/%f'\" >> /var/lib/postgresql/data/postgresql.conf
echo \"wal_level = replica\" >> /var/lib/postgresql/data/postgresql.conf
"
docker restart pgTake a base backup:
docker exec pg pg_basebackup -D /var/lib/postgresql/backups/base-$(date +%Y%m%d) -Ft -z -PInsert some data over time:
docker exec -i pg psql -U postgres <<EOF
CREATE TABLE events (id serial PRIMARY KEY, at timestamp DEFAULT now(), msg text);
INSERT INTO events (msg) VALUES ('1pm: hello'), ('2pm: world');
EOFDisaster: drop the table
docker exec -i pg psql -U postgres -c "DROP TABLE events;"
docker exec -i pg psql -U postgres -c "SELECT * FROM events;" # ErrorsRestore to point-in-time
Stop Postgres, restore base, configure recovery target:
docker stop pg
rm -rf pgdata && mkdir pgdata
tar xzf $(ls -t pgdata-backups/base-*/base.tar.gz | head -1) -C pgdata
# Create recovery.signal + recovery target
echo "" > pgdata/recovery.signal
cat >> pgdata/postgresql.conf <<EOF
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2026-05-21 14:30:00'
recovery_target_action = 'promote'
EOF
docker start pg
# Verify - the table is back, with rows up to your recovery point
docker exec -i pg psql -U postgres -c "SELECT * FROM events;"In managed RDS / Cloud SQL this is a single API call: "restore to point in time X." But the underlying mechanism is the same.
Offsite Encrypted Backup with restic
For files (configs, code, certificates), use restic. Encrypted, deduplicated, supports many backends.
# Initialize a restic repo on S3-compatible storage
export RESTIC_REPOSITORY="s3:http://localhost:9000/restic"
export RESTIC_PASSWORD="mystrongpassword"
export AWS_ACCESS_KEY_ID=minio
export AWS_SECRET_ACCESS_KEY=minio123
restic init
# First backup
restic backup ~/important-configs
# Subsequent backups - only changes are uploaded
restic backup ~/important-configs
# List snapshots
restic snapshots
# Restore
restic restore latest --target /tmp/restore-test
diff -r ~/important-configs /tmp/restore-test/$(basename ~/important-configs)restic is encrypted at the client; the server (S3) never sees plaintext. The encryption password is the only thing between an attacker and your data — secure it accordingly.
A Mini Failover Exercise
Pretend your primary region is gone. Walk through the steps without panic:
- Detect: alert fires; on-call confirms region is unreachable.
- Declare: "We are in a regional outage. Failing over to us-west-2."
- DNS update: Route53 health check fails → traffic shifts to standby region.
- Database promotion: Demote primary (already unreachable); promote read replica in us-west-2.
- Verify: synthetic transactions succeed; metrics flow.
- Communicate: status page updated; customer messaging sent.
Time this end-to-end. Your actual RTO vs target RTO is the gap to close.
Cleanup
kind delete cluster --name dr
docker rm -f pg
helm uninstall minio
rm -rf pgdata pgarchiveWhat's Next
- Patterns — multi-region, snapshot lifecycle, immutable backups, runbooks, GameDay
- Best Practices — testing cadence, RTO measurement, encryption, compliance, pitfalls