Steven's Knowledge

Best Practices

Production-ready Docker - security, registries, signing, scanning, build pipelines, and operational habits

Best Practices

The patterns here separate "it works on my laptop" from "I'm comfortable running this in production."

Security

Containers share the host kernel — a compromised container reaches farther than a compromised process in a VM. Default to least privilege.

Run as Non-Root

Almost every base image starts as root. Always drop privileges:

FROM node:20-alpine
RUN addgroup --system --gid 1001 appgroup \
 && adduser  --system --uid 1001 appuser

WORKDIR /app
COPY --chown=appuser:appgroup . .

USER appuser
CMD ["node", "server.js"]

In Compose:

services:
  app:
    user: "1001:1001"
    read_only: true                          # filesystem is read-only
    tmpfs:
      - /tmp                                  # except a tmpfs for /tmp
    cap_drop:
      - ALL                                   # drop all Linux capabilities
    security_opt:
      - no-new-privileges:true                # can't escalate via setuid binaries

Mind the Base Image

ChoiceNotes
alpine~5 MB; muslmusl libc surprises some apps (e.g. native Node modules)
<lang>-slim (Debian-based)Larger than alpine but maximum compatibility
gcr.io/distroless/...No shell, no package manager — minimal attack surface
scratchEmpty base; for statically-linked binaries (Go, Rust)

Don't use giant general-purpose images (ubuntu:latest, centos:latest) for production apps. Use the language-specific or distroless variant.

Never Bake Secrets In

Anything passed via --build-arg or copied during build is recoverable from the image's layers. Real secrets enter at runtime:

services:
  app:
    env_file: .env                           # local dev
    # In production, inject from a secret manager
    environment:
      DATABASE_URL: ${DATABASE_URL}

For build-time secrets (private dep registries, npm tokens), use BuildKit's secret mounts:

# syntax=docker/dockerfile:1.7
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc \
    npm ci --omit=dev
DOCKER_BUILDKIT=1 docker build \
  --secret id=npmrc,src=$HOME/.npmrc \
  -t myapp:0.1.0 .

The secret is mounted in for that RUN only — never lands on a layer.

Scan Every Image

Catch CVEs before they ship. Free, popular options:

ToolNotes
TrivyFast, used everywhere, runs in CI
GrypeFrom the Anchore ecosystem
Docker ScoutBuilt into recent Docker Desktop
SnykCommercial; SaaS dashboards

In CI:

# .github/workflows/build.yml
- name: Build
  run: docker build -t myapp:${{ github.sha }} .

- name: Trivy scan
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: myapp:${{ github.sha }}
    severity: HIGH,CRITICAL
    exit-code: 1                             # fail the build on findings

Sign Your Images

Past "is this CVE-free?" comes "is this actually the image I built?". Cosign signs images and stores signatures in the registry:

# Sign on push
cosign sign --key cosign.key registry/myapp:0.1.0

# Verify before deploy
cosign verify --key cosign.pub registry/myapp:0.1.0

Kubernetes admission controllers (Kyverno, Connaisseur) can enforce: refuse to run images that aren't signed by your key.

Image Hygiene

HabitWhy
Pin tags — never latestReproducible deploys; rollbacks are meaningful
Use digests in production (@sha256:...)The same tag can be republished
Semver tags + digests for shared imagesHumans read :1.2.3; machines pin the digest
Multi-arch builds (linux/amd64,linux/arm64)M-series Macs, Graviton instances
.dockerignore in every projectFaster builds, fewer mistakes
One process per containerEasier to scale, restart, monitor
Stateless containersAll state in volumes / databases / object storage

Multi-Arch Builds with Buildx

docker buildx create --name multi --use
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t myregistry/myapp:0.1.0 \
  --push \
  .

--push is required for multi-arch — local Docker engine can only load one architecture; the registry holds the manifest list.

Registries

RegistryNotes
Docker HubDefault; free public, rate-limited for unauthenticated pulls
GitHub Container Registry (ghcr.io)Free private, tied to GitHub auth
AWS ECRFirst-class for AWS workloads; private by default
Google Artifact Registry (GAR)GCP equivalent
Azure Container Registry (ACR)Azure equivalent
Self-hosted (distribution/registry, Harbor)Air-gapped or strict control

Push:

echo "$TOKEN" | docker login ghcr.io -u "$USER" --password-stdin
docker tag myapp:0.1.0 ghcr.io/myorg/myapp:0.1.0
docker push ghcr.io/myorg/myapp:0.1.0

A few habits:

  • A pull-through cache in front of Docker Hub avoids rate limits in CI.
  • Garbage collect old tags on a schedule — registries don't auto-clean.
  • Restrict who can push. A single typo with :latest push privileges can take down a fleet.

Build Pipelines

A standard CI flow for app images:

# .github/workflows/build.yml
name: build
on:
  push:
    branches: [main]
    tags:    ["v*.*.*"]
  pull_request:

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
      id-token: write                        # for Cosign keyless
    steps:
      - uses: actions/checkout@v4

      - uses: docker/setup-qemu-action@v3
      - uses: docker/setup-buildx-action@v3

      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - id: meta
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/${{ github.repository }}
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=semver,pattern={{version}}
            type=sha,prefix=

      - uses: docker/build-push-action@v6
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          push: ${{ github.event_name != 'pull_request' }}
          tags:   ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to:   type=gha,mode=max

      - uses: aquasecurity/trivy-action@master
        with:
          image-ref: ghcr.io/${{ github.repository }}:${{ github.sha }}
          severity: HIGH,CRITICAL
          exit-code: 1
        if: github.event_name == 'pull_request'

      - uses: sigstore/cosign-installer@v3
        if: startsWith(github.ref, 'refs/tags/v')
      - run: cosign sign --yes ghcr.io/${{ github.repository }}@${{ steps.meta.outputs.digest }}
        if: startsWith(github.ref, 'refs/tags/v')

The shape: PR builds + scans; main builds, pushes, and caches; tag builds also sign.

Observability

Containers without logs and metrics are a black box. Minimum:

ConcernApproach
LogsWrite to stdout/stderr; the runtime captures them
Log shippingdocker run --log-driver=... or a sidecar (Fluent Bit, Vector)
MetricsInstrument the app (see Prometheus & Grafana)
Container metricscAdvisor / node_exporter on the host
HealthchecksHEALTHCHECK in the Dockerfile and Compose-level checks
STOPSIGNAL and PID 1Make sure SIGTERM reaches your app; use exec form of CMD
STOPSIGNAL SIGTERM                            # what Docker sends on stop

# This form runs the binary directly (PID 1) — SIGTERM forwarded correctly
CMD ["node", "server.js"]

The shell form (CMD node server.js) goes through /bin/sh -c, which can swallow signals. Always prefer exec form for production.

Operational Habits

A handful that pay off:

  1. One image, many environments. Build once; differ only by runtime config (env vars, mounts).
  2. Immutable tags. Once myapp:1.2.3 is pushed, never overwrite it.
  3. Roll forward, not back, in production. Rollback = "deploy the previous immutable tag," not "fix the image in place."
  4. Run as non-root and read-only. Almost every container can.
  5. Reap zombies. Use tini (docker run --init) or dumb-init if your app spawns children.
  6. Cap resources. Memory limits in particular — OOM-killed by the kernel beats grinding the whole host to a halt.
  7. Don't bundle databases in the same image as the app. Always separate containers; usually a managed DB in real production.

Checklist

Pre-production Docker checklist

  • Images built from a Dockerfile in version control (no docker commit-style)
  • Multi-stage builds with a minimal runtime base
  • .dockerignore in every project
  • Containers run as non-root with USER
  • Read-only root filesystem; writes only to mounted volumes
  • cap_drop: ALL, no-new-privileges, resource limits set
  • Image tags pinned and immutable; production references digests
  • Build pipeline scans images (Trivy / Scout) and signs them (Cosign)
  • Multi-arch (amd64 + arm64) where relevant
  • HEALTHCHECK defined and consumed (Compose depends_on, K8s probes)
  • App writes logs to stdout/stderr; shipped off-host
  • Secrets supplied at runtime from a secret manager, not baked in
  • STOPSIGNAL and PID 1 behavior correct; graceful shutdown tested
  • Registry credentials least-privilege; tag retention policy in place

On this page