Best Practices

Dev mode taught you the API. Production Vault is a different beast — it's stateful, it can be sealed (locked), and it sits in the critical path of every workload. A Vault outage is a deploy outage. Treat it accordingly.

Storage Backend

Vault stores its data in a configurable backend. The right choice defines your operational story:

Backend	Notes
Integrated Storage (Raft)	The default; built-in clustering, no external dep. Use this for new deployments.
Consul	Older recommendation; still solid but adds an operational system to run
File / In-memory	Single-node only; testing
Cloud (S3, GCS, ...)	Single-node only (no locking); fine for backup, not HA

Integrated Raft means 3 or 5 Vault nodes, no external Consul cluster, simpler ops. It's what you want.

High Availability Topology

                    ┌──────────────────┐
              ┌────►│  Vault Leader    │ ◄─── writes
              │     └──────────────────┘
              │              │ Raft replication
   ┌─────────┴────┐    ┌─────▼─────────┐    ┌──────────────┐
   │  Client      │    │ Vault Standby │    │ Vault Standby│
   │              │    └───────────────┘    └──────────────┘
   └──────────────┘             ▲                    ▲
                                └────── reads ──────┘ (with performance standbys)

3 or 5 nodes in the cluster (always odd, so quorum is well-defined).
One leader, others standby. Reads can be served from performance standbys (Enterprise).
A load balancer in front, health-checking sys/health — it returns 200 for active, 429 for standby.
Spread across availability zones / racks — one zone failure shouldn't kill quorum.

Sealing and Unsealing

Vault encrypts everything with a master key that lives in memory. On startup or restart Vault is sealed — it can't decrypt anything until you give it the unseal keys. This is good (no plaintext on disk) and operationally awkward (the cluster won't come back without intervention).

Three ways to handle it:

Method	How
Shamir's Secret Sharing	Split unseal key into N shares; need M to unseal. Human ops.
Auto-unseal with cloud KMS	AWS KMS / GCP KMS / Azure Key Vault / OCI does the unseal. What you want in production.
HSM	Hardware security module holds the key. Enterprise/regulated environments.

Auto-unseal example (Raft + AWS KMS):

# vault.hcl
storage "raft" {
  path    = "/vault/data"
  node_id = "vault-1"
}

listener "tcp" {
  address     = "0.0.0.0:8200"
  tls_cert_file = "/etc/vault/tls/cert.pem"
  tls_key_file  = "/etc/vault/tls/key.pem"
}

seal "awskms" {
  region     = "us-east-1"
  kms_key_id = "alias/vault-unseal"
}

cluster_addr = "https://vault-1.example.com:8201"
api_addr     = "https://vault.example.com:8200"

Now restarts unseal automatically. Recovery keys (M-of-N) are still issued for break-glass.

Auto-unseal trades one risk for another. If your KMS goes down or you lose access to it, Vault is bricked. Print the recovery keys, distribute them to humans, store them out-of-band. Test the recovery procedure once a year.

Auth Methods

token is for humans and bootstrapping. Production workloads use an identity-aware auth method:

Method	Best for
Kubernetes	Pods authenticate with their ServiceAccount JWT
AWS IAM	EC2 / ECS / Lambda / EKS workloads with an IAM role
GCP / Azure	Same idea on those clouds
OIDC / JWT	GitHub Actions, GitLab CI, generic CI runners
AppRole	Anything else that can hold a secret_id (legacy fallback)
userpass / LDAP / SSO	Humans

Kubernetes Auth (the most common)

vault auth enable kubernetes

vault write auth/kubernetes/config \
  kubernetes_host="https://kubernetes.default.svc" \
  token_reviewer_jwt="@/var/run/secrets/kubernetes.io/serviceaccount/token" \
  kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt

vault write auth/kubernetes/role/myapp \
  bound_service_account_names=myapp \
  bound_service_account_namespaces=production \
  policies=app \
  ttl=1h

A pod with that ServiceAccount can now exchange its projected JWT for a Vault token:

# Inside the pod
JWT=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
VAULT_TOKEN=$(curl -s --request POST \
  --data "{\"jwt\":\"$JWT\",\"role\":\"myapp\"}" \
  $VAULT_ADDR/v1/auth/kubernetes/login | jq -r .auth.client_token)

Or use the Vault Agent Injector — a mutating webhook that adds an init container to your pods which does the auth and writes secrets to a shared volume. No code changes in your app.

CI Auth with OIDC (GitHub Actions example)

- name: Auth to Vault
  uses: hashicorp/vault-action@v3
  with:
    url: https://vault.example.com
    method: jwt
    role: gha-deploy
    secrets: |
      secret/data/deploy   aws_access_key | AWS_ACCESS_KEY_ID ;
      secret/data/deploy   aws_secret_key | AWS_SECRET_ACCESS_KEY ;

GitHub's OIDC token is verified by Vault; no long-lived secret in GitHub. This is the modern way to wire CI to a secret store.

Audit Devices

Turn on audit logging on day one — there's no useful forensics without it.

vault audit enable file file_path=/vault/logs/audit.log
# Or stream to syslog, or to a socket → your log pipeline

Every request and response is logged (with HMAC'd sensitive fields). Ship those logs to your SIEM / ELK.

Kubernetes Integration Patterns

Three ways to get secrets to pods, ranked by sophistication:

1. Vault Agent Injector (simple, popular)

Annotations on a Deployment tell the injector to add an init/sidecar container that auths to Vault and writes secrets to a shared volume:

metadata:
  annotations:
    vault.hashicorp.com/agent-inject: "true"
    vault.hashicorp.com/role: "myapp"
    vault.hashicorp.com/agent-inject-secret-db: "database/creds/myapp-read"
    vault.hashicorp.com/agent-inject-template-db: |
      {{- with secret "database/creds/myapp-read" -}}
      DB_USER={{ .Data.username }}
      DB_PASS={{ .Data.password }}
      {{- end -}}

The app reads /vault/secrets/db — never knows Vault exists. The agent renews leases on its own.

2. External Secrets Operator + Vault

The External Secrets Operator pulls from Vault and creates native Kubernetes Secrets. Apps consume them via standard envFrom / volume mounts.

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-creds
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend
    kind: ClusterSecretStore
  target:
    name: db-creds-secret
  data:
    - secretKey: password
      remoteRef:
        key: secret/app/db
        property: password

Trade-off: secrets land as Kubernetes Secrets (base64) — less ideal for true zero-trust, but a smooth integration path.

3. CSI Secrets Store + Vault provider

Mount secrets as files via the Kubernetes CSI Secret Store driver. Similar to the agent injector but uses CSI semantics.

For most teams, Vault Agent Injector is the right starting point; graduate to ESO if you need broader K8s ecosystem integration.

Backup and Disaster Recovery

Vault's data is the recipe for accessing everything else — losing it is catastrophic.

# Take a Raft snapshot
vault operator raft snapshot save backup-$(date +%F).snap

# Restore (replaces all data!)
vault operator raft snapshot restore backup-2026-05-21.snap

Automation:

Schedule snapshots every 15-60 minutes via a CronJob.
Ship snapshots off-cluster to S3 / GCS with encryption (separate KMS key).
Test restores quarterly. An untested backup is not a backup.
Cross-region replica for DR if you have Enterprise (Performance / DR replication).

Operational Habits

A handful that pay off:

TLS for everything. Self-signed for dev; ACM / Let's Encrypt / your CA in production. Vault should never speak plain HTTP.
Never use the root token from a workflow. Generate a token from the root, use it once for setup, revoke it. Day-to-day work uses identity-bound tokens.
Policy as code. Policies in git, deployed by CI. vault policy write from terminals is a smell.
One mount per logical concern. secret/, database/, pki/internal/, transit/customer-data/ — paths express intent.
Lease TTLs short by default. 1h, not 30 days. If something can't handle 1h renewal, instrument it.
Don't enable everything. Each secrets engine and auth method is an attack surface. Enable what you actively use.
Monitor the unseal status. A standby that won't unseal during a leader election ruins your HA.
Watch Vault token usage and failed login metrics. Anomalies = early signal of misuse.

Checklist

Best Practices

Storage Backend

High Availability Topology

Sealing and Unsealing

Auth Methods

Kubernetes Auth (the most common)

CI Auth with OIDC (GitHub Actions example)

Audit Devices

Kubernetes Integration Patterns

1. Vault Agent Injector (simple, popular)

2. External Secrets Operator + Vault

3. CSI Secrets Store + Vault provider

Backup and Disaster Recovery

Operational Habits

Checklist

Best Practices

On this page