Steven's Knowledge

Best Practices

Threat modeling the pipeline, key management, false positive handling, compliance, common pitfalls, scaling

Best Practices

The operational realities of running supply chain security at scale.

Threat Model Your Pipeline

Before adopting tools, understand what you're defending against:

ThreatWhat it looks like
Source compromiseAttacker pushes to your repo (stolen creds, malicious PR)
Dependency injectionMalicious update to a dep you use (typosquat, hijacked maintainer)
Build compromiseCI runner runs attacker code (poisoned GitHub Action, escaped sandbox)
Artifact tamperingImage swapped between build and deploy (compromised registry, MITM)
Key theftLong-lived signing key stolen; arbitrary signing
Deploy bypassDirect kubectl apply skips signed pipeline
Runtime injectionkubectl exec or container escape modifies running pod

Match each threat to a control. The defense should be layered — single defenses fail under sophisticated attack.

Key Management

If you must have keys:

  • No human-readable keys on disk beyond bootstrap. Use HSM (AWS KMS, GCP KMS, Azure Key Vault, YubiHSM).
  • Short-lived where possible. Sigstore keyless is the gold standard — no key to manage.
  • Separate keys per environment. Dev signing key compromise shouldn't affect prod.
  • Rotation procedure tested before you need it.
  • Witness signatures for irreplaceable keys: M-of-N signing for the root key.
  • Audit log: every signing operation logged with identity, artifact digest, timestamp.

Cosign supports KMS providers directly:

cosign sign --key awskms:///alias/cosign-key $DIGEST

The private key never leaves the HSM.

Scaling Beyond One Image

A few images: copy-paste workflows. Hundreds: template. Patterns:

  • Reusable GitHub Actions workflow (uses:) that builds + signs + SBOMs in a standard way. All services call it.
  • Make / Justfile / Bazel target that wraps the chain locally and in CI.
  • IDP golden path (Internal Developer Platforms) that scaffolds a new service with signing wired up by default.
  • Compliance dashboards (Backstage / Dependency-Track) that show coverage: signed % / SBOM'd % / SLSA level per service.

The metric: percentage of production images that pass full verification. Trend this; teams gravitate up when they see they're outliers.

Handling False Positives

Vulnerability scanners are noisy. A real production scan finds:

  • Critical CVEs in code paths you don't use
  • Vulnerabilities in test-only dependencies
  • Issues in base image that the distro hasn't backported yet
  • Disputed CVEs (security researchers and vendors disagreeing)

Without triage, alert fatigue takes over and real issues get missed. Process:

  1. Triage rules: auto-suppress development-only deps, accepted base images.
  2. VEX statements for "vulnerable but not exploitable" cases — signed declarations.
  3. SLA on critical: 14 days to fix; document if not.
  4. Block on new critical in PRs (Renovate/Dependabot policy).
  5. Quarterly review of suppressions: still valid? Still vulnerable?

The goal: every open finding has a status (fixed, in-progress, accepted with rationale). None should be "ignored because there are too many."

CI Hardening

If your CI is compromised, so is your supply chain:

  • Ephemeral runners: fresh runner per job, no state carries between jobs.
  • OIDC for cloud auth: no long-lived secrets in CI. GitHub Actions / GitLab CI both support OIDC to AWS/GCP/Azure.
  • Pin Actions / Plugins to commit SHA, not version tags (mutable). actions/checkout@v4 can be hijacked; actions/checkout@abc123... cannot.
  • permissions: read-all by default, write only where needed.
  • Branch protection: require reviewed PRs to merge to main.
  • Required workflows: GitHub Enterprise can enforce certain workflows run.
  • Forks treated as untrusted: PRs from forks don't get secrets access (default behavior; verify).

The CI is now a production system. Apply production-grade controls to it.

Registry Hygiene

The container registry is high-value. Protect:

  • Authentication required for push; pull may be public if intentional.
  • Image scanning at registry level (Harbor, ECR, GHCR built-in scanners).
  • Immutable tags for releases — v1.2.3 can never be re-pushed.
  • Quarantine new images until scan completes; only verified images move to production paths.
  • Cleanup policy: delete old, untagged images; reduces attack surface and storage cost.
  • Replicate to secondary registry for DR. Don't lose the ability to deploy because GitHub Container Registry is down.

Documentation

Auditors will ask:

  • How are images signed? Point to the workflow file.
  • How is signing verified? Point to the Kyverno / Sigstore policy.
  • Show the chain of custody for v1.2.3. cosign tree, cosign verify-attestation, output stored.
  • Show vulnerability triage process. SLA doc; example tickets.
  • Show response to log4j. SBOM query; affected services; patch timeline.

Document each. The documents are the evidence of process, separate from the technical controls.

Compliance Mapping

Match controls to frameworks:

FrameworkControlSupply chain answer
SOC 2 CC6.6Logical accessSigned images + admission policy
SOC 2 CC8.1Change managementSLSA provenance + GitOps
NIST 800-218 PS.3.1Archive softwareSBOM + immutable tags + signed
NIST 800-218 PW.4.1Approved depsLock files + scan in CI
EU CRAVulnerability handlingDependency-Track + VEX + patch SLA
EO 14028SBOM for federalSyft-generated SBOM, signed, retained

A single supply chain practice answers many controls. Document the mapping; auditors are grateful.

What to Bypass and What Not To

The pipeline must remain operable under stress. Decide upfront:

  • Emergency hotfix: still goes through CI + signing, even if expedited.
  • Off-hours deploy without approver: explicit break-glass procedure, logged, reviewed next business day.
  • Signing infrastructure down: cache last good signatures; don't route around signing entirely.
  • Sigstore public infrastructure down: most teams run their own Rekor/Fulcio mirror or have a vendor fallback.

Decide before you need to. "We'll figure it out when it breaks" produces unsigned production deploys.

Common Pitfalls

Theater without enforcement. Signing every image, but admission doesn't verify. Verify before celebrating.

Identity not pinned correctly. Cosign verifies "an image is signed" but doesn't check who. Always pin to specific OIDC issuer + identity regexp.

SBOM stale or wrong tool. Building a fresh SBOM is fast; using one from last quarter for today's image is meaningless.

Block on every CVE. Critical & high might block; medium should rate-limit; low should track. Otherwise you block production for cosmetic issues.

Forgetting base images. Your code is clean. The base image has 30 CVEs. Plan base image refresh as a continuous task.

Treating signing as a one-time effort. Signing infrastructure breaks (cert expiry, registry change, etc.). Monitor signing success rate the way you monitor build success rate.

Hidden direct-pulls. A CI step that does pip install from PyPI bypasses your proxy and signing chain. Audit the actual network calls.

Verification-only in CI, not on deploy. Verifying in CI proves the build is clean. Verifying at admission proves what's deployed is clean. Do both.

Continuous Improvement

The threat landscape evolves. Treat supply chain security as a continuous practice:

  • Quarterly threat model review
  • Pen tests targeting the build pipeline
  • Subscribe to OSV, GitHub Security advisories
  • Monitor Sigstore project security advisories
  • Keep tooling (cosign, syft, grype) updated; old versions miss new vuln formats

Checklist

Supply chain security production readiness:

  • Every production image is signed
  • Admission policy verifies signatures (Kyverno / Sigstore Policy Controller)
  • Identity expectations pinned (OIDC issuer + identity regex)
  • SBOM generated per build, attached as attestation
  • SBOM hub (Dependency-Track / Anchore) ingesting all SBOMs
  • Vulnerability scan in CI; critical/high blocks merge
  • VEX statements for accepted-risk CVEs
  • SLSA provenance attestations generated (Level 2+)
  • Caching proxy for upstream registries
  • Lockfile + integrity hashes for all package managers
  • CI: ephemeral runners, OIDC, pinned Actions by SHA
  • Keyless signing or HSM-backed keys (no long-lived plaintext keys)
  • Registry: immutable tags, scanning, replication
  • Documented incident response: "we found CVE X, how do we know who's affected?"
  • Quarterly threat model review and metric review

What's Next

You have a supply chain practice. Connect it to:

  • CI/CD — signing belongs in the pipeline
  • Policy as Code — admission policies enforce signatures
  • GitOps — Git is the trusted source; signatures bind artifacts to source
  • Secrets — Vault holds CI tokens and signing keys (if any)
  • Internal Developer Platforms — golden paths emit signed-by-default services

On this page