Best Practices

These principles apply whether you're on GitHub Actions, GitLab CI, CircleCI, Jenkins, or anything else. The platform syntax differs; the goals don't.

Speed Is a Feature

Slow CI strangles teams. Aim for PR feedback in under 10 minutes. Above that, devs context-switch, batch up changes, stop running CI locally, and ship slower.

Profile First

Don't guess where time goes. Most platforms surface per-step timing — find the slowest steps and attack those. Common offenders:

Slowdown	Fix
Dependency install from scratch every run	Caching (npm, pip, Go modules, Cargo, ...)
Single-process test runs	Parallelize across machines or test sharding
Re-builds of everything on every commit	Build caching (Docker layer cache, build tool cache)
Sequential stages where DAG would do	`needs:` (GitLab) or `needs:` (GHA) explicit dependencies
Hitting a slow registry / package mirror	Pull-through cache; geographically nearer mirror
Full E2E suite on every PR	Gate E2E behind PR labels / nightly; smaller smoke tests on PRs

Cache deterministic things (dependencies pinned by lockfile, base images, build tool caches). Don't cache side effects (test output, dynamic config). The cache key should change exactly when the cacheable content should change:

# Good: lockfile change invalidates
key: npm-${{ hashFiles('package-lock.json') }}

# Bad: branch-based; stale caches forever
key: npm-${{ github.ref_name }}

Both Actions and GitLab support size-bound caches with LRU eviction. Don't cache 5GB of node_modules per branch — set sensible paths.

Parallelize Wisely

Two axes:

Job-level: independent jobs run in parallel by default. Use needs: to express only the real dependencies.
Test-level: split your test suite across N runners. Jest has --shard, pytest has pytest-split, Go has -parallel.

Test sharding pays off above ~5 min of test runtime. Below that, splitting just adds overhead.

Security

No Long-Lived Cloud Credentials

The number-one win of the last few years: OIDC.

# GitHub Actions
permissions:
  id-token: write
  contents: read

- uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123:role/gha-deploy

# GitLab CI
deploy:
  id_tokens:
    AWS_TOKEN: { aud: sts.amazonaws.com }
  script:
    - aws sts assume-role-with-web-identity ...

The CI platform mints a short-lived JWT; AWS/GCP/Azure trusts it under specific conditions (this repo, this branch, this environment). Stop storing AWS_ACCESS_KEY_ID in CI secrets.

Limit Secret Access

Both platforms separate:

Repository-level secrets — visible to any workflow / pipeline in the repo.
Environment-level secrets — visible only when the job declares that environment.
Protected branch / tag restrictions — secrets won't load on a PR from a fork.

A common pattern: production secrets are environment-scoped to production, and the environment requires reviewer approval. A malicious or accidental change can't reach production secrets without a human gate.

Watch What Untrusted Code Can Do

Pull requests from forks can execute code in your CI. Both platforms default to not exposing secrets to fork PRs, but the runner itself executes attacker-controlled code. Mitigations:

Run untrusted PR builds on separate, sandboxed runners (or hosted, never self-hosted).
Treat the PR's pipeline as "lint + build + test only" — no deploy.
Manually trigger trusted deploy pipelines after review (or after the PR merges).

Pin Third-Party Actions / Templates

# BAD: latest, could change tomorrow
- uses: some-marketplace/awesome-action@main

# OK: tag (mutable)
- uses: some-marketplace/awesome-action@v3

# BEST: commit SHA (immutable)
- uses: some-marketplace/awesome-action@a1b2c3d4e5f6...

A compromised action runs in your pipeline with your secrets. Pin to a SHA for security-sensitive actions (cloud-credential setup, deploy steps, anything with id-token: write).

Audit the Logs

Both platforms log every workflow execution. Ship those logs off-platform to your SIEM / ELK. Watch for:

Workflows triggered by forks attempting access to protected resources.
Sudden spikes in secret access.
Self-hosted runners showing up that aren't yours.

Pipeline Design

One Immutable Artifact

The cardinal rule. Build once in CI, deploy that exact artifact (with digest!) to every environment.

# CI
- build image → push as ghcr.io/myorg/api@sha256:abc...

# Deploy staging
- kubectl set image api=ghcr.io/myorg/api@sha256:abc...

# Deploy production
- kubectl set image api=ghcr.io/myorg/api@sha256:abc...

If staging and production deploy different builds, you're not testing what you ship. Use digests, not tags, to pin.

Promote Through Environments

PR        → build, lint, test
main      → build, deploy to staging, run smoke tests
release   → promote (re-tag) staging image, deploy to production

Promotions are fast because they don't rebuild — they just re-tag and re-deploy. The same @sha256:abc... that ran in staging for 48 hours now runs in production.

Use Environment Approvals as the Production Gate

GitHub Environments and GitLab CI Environments both support manual approval gates. Require them for production:

Reviewer must approve before the job runs.
The approval is logged with the workflow run.
A deploy can't sneak through outside of working hours by accident.

Roll Forward, Not Back, by Default

When a deploy is bad:

Preferred: revert the commit in git → CI deploys the previous artifact. Same controls, same audit trail.
Emergency: kubectl rollout undo or equivalent — fast, but skips your CI gates. Document each instance.

Don't develop a culture of cluster-side hotfixes; they accumulate and become tech debt.

Avoid Pipeline-as-Logic

CI YAML is a config language, not a program. When you find yourself doing:

Loops in YAML
Conditional if chains 10 deep
Shell scripts spread across 5 inline run: blocks

… extract that logic into a real script (./scripts/deploy.sh, ./scripts/release.py) that humans can run locally. The CI pipeline just calls it. You can reproduce CI failures on your laptop.

Deployment Strategies

A short tour of the patterns CI/CD enables:

Strategy	How
Rolling update	Replace N old pods with new, gradually (Kubernetes Deployment default)
Blue/Green	Stand up new version completely; switch traffic atomically
Canary	New version takes 1%/5%/25% of traffic; monitor; expand
Feature flags	Deploy code but keep features off; flip per user / org

Feature flags decouple deployment from release. A risky feature can be deployed dark, enabled for one internal user, then ramped up — even though it's the same artifact in production all along.

Observability of the Pipeline

Track CI/CD itself, not just what it ships:

Metric	Why
Pipeline success rate	Trending down = something's flaky; investigate
P50 / P95 pipeline duration	The user feedback loop
Deploys per day	DORA metric — high-performing teams deploy frequently
Lead time for changes	PR open → deployed to prod
Mean time to recovery (MTTR)	Deploy broke → fixed
Change failure rate	% of deploys that needed rollback

The four DORA metrics correlate strongly with team performance. Instrument them.

Pipeline Hygiene

A handful of habits:

Pin tool versions. Node 20, Python 3.12, Terraform 1.7.5 — exact, not latest.
Reproduce locally. Whatever CI does, you should be able to run on your laptop.
Treat flaky tests as bugs. Each flake erodes trust. Quarantine, then fix, then re-enable.
Pipelines fail loudly. set -euo pipefail in shell, fail-fast in matrix jobs.
Don't disable failing tests. If you must, file a ticket and put a date on the skip.
Run security scans in CI. SAST, dependency scan, container image scan — all platforms have built-in or integrated options.
One-button rollback. Make it easier than rolling forward when needed.
Document the deployment process in the repo. A new engineer should follow it without Slack help.

Checklist

Best Practices

Speed Is a Feature

Profile First

Cache Smart

Parallelize Wisely

Security

No Long-Lived Cloud Credentials

Limit Secret Access

Watch What Untrusted Code Can Do

Pin Third-Party Actions / Templates

Audit the Logs

Pipeline Design

One Immutable Artifact

Promote Through Environments

Use Environment Approvals as the Production Gate

Roll Forward, Not Back, by Default

Avoid Pipeline-as-Logic

Deployment Strategies

Observability of the Pipeline

Pipeline Hygiene

Checklist

Best Practices

On this page