Steven's Knowledge

Patterns

Golden paths, scorecards, RBAC, secrets workflows, Crossplane for cloud resources, multi-cluster routing

Patterns

The patterns that turn a Backstage install into a real internal platform.

Golden Paths (Paved Roads)

A golden path is a templated, opinionated way to do a common task. Each one handles 80% of cases end-to-end:

PathWhat it covers
New servicerepo + CI + manifests + dashboards + catalog entry + ownership
New API endpointOpenAPI spec + handler + rate limit + auth + docs
New databasePostgres in Crossplane + creds in Vault + ESO binding + migration scaffold
New cacheRedis instance + config injection + Grafana dashboard
New cron jobK8s CronJob + alerting + observability + runbook scaffold
Add a new regionreplication config + DNS + monitoring

Each path is a Backstage Template + a series of orchestrated actions. Engineers know that following the path means consistency, observability, security, and operability are handled.

Scorecards

A scorecard is a function from a service to a grade:

# Backstage Soundcheck (or OpsLevel / Cortex equivalent)
checks:
  - id: has-readme-runbook
    severity: medium
    rule: |
      readme.content contains "## Runbook"

  - id: has-on-call
    severity: high
    rule: |
      entity.spec.owner has pagerduty integration

  - id: deploys-recently
    severity: medium
    rule: |
      lastDeployedAt within 30 days

  - id: no-critical-cves
    severity: critical
    rule: |
      image.scan.criticalCount == 0

Aggregated:

ServiceGradeOpen gaps
checkout-serviceA
payments-serviceBMissing runbook
reports-serviceDNo on-call, stale deploy, 2 CVEs

Roll up by team, by lifecycle stage, by system. The conversation shifts from "are we doing things right?" to "here's our objective gap analysis."

Phase scorecards in slowly:

  1. Visible only for 1 quarter — gather baseline
  2. Highlighted in team reviews — pressure without punishment
  3. Required for production lifecycle — block "lifecycle: production" tag without grade ≥ B

Crossplane for Cloud Resources

Engineers shouldn't write Terraform for routine resources. Wrap cloud provisioning in K8s CRDs via Crossplane:

# Engineer writes this:
apiVersion: platform.example.com/v1
kind: PostgresInstance
metadata:
  name: checkout-db
  namespace: payments
spec:
  size: small         # → platform decides instance type
  storageGB: 20
  highAvailability: false

Backstage scaffolder creates this resource as part of a "new database" template. Crossplane reconciles it into:

  • An RDS instance with right size, region, security group
  • DB credentials in Vault
  • External Secrets Operator binding to a K8s Secret
  • Backup policy
  • Tags for FinOps attribution
  • An entry in the service catalog

The engineer learned nothing about RDS provisioning — by design. Platform team encoded the right shape once.

Secrets Workflows

Self-service secrets are a common platform need:

[Engineer asks for a new env var SLACK_TOKEN] →
  → Scaffolder template "add a secret"
  → Form: secret name, environment, value (sealed in browser)
  → Backstage creates: Vault path + ExternalSecret + reference in deployment
  → PR opened to GitOps repo
  → Engineer approves + merges

The secret value never appears in plain text in any system the engineer touches. The workflow:

  1. Browser-side encryption to the cluster's Sealed Secrets / Vault transit key.
  2. No platform engineer in the loop for routine secret creation.
  3. Audit trail: who requested, what secret, when, in which env.
  4. Rotation: a "rotate this secret" button in the catalog.

Multi-Cluster Routing

Engineers shouldn't pick which cluster to deploy to. Platform decides:

# Template ingredient: deploy spec
deployment:
  workload: checkout
  environments:
    - { name: dev, tier: standard }
    - { name: staging, tier: standard }
    - { name: prod-us, tier: tier1, region: us-east-1 }
    - { name: prod-eu, tier: tier1, region: eu-west-1 }

The orchestrator (Humanitec, Score-aware tool, custom controller) maps tier1 + us-east-1 to "production cluster in us-east-1 with enhanced policies." Engineers never wrote a kubeconfig.

This pattern is the most powerful win of an IDP: decoupling intent from placement. Move workloads between clusters without app changes.

RBAC in the Portal

Backstage permissions are flexible:

// permission policy
class MyPermissionPolicy implements PermissionPolicy {
  async handle(request: PolicyQuery, user?: BackstageIdentityResponse): Promise<PolicyDecision> {
    // Engineers can scaffold; only admins can delete
    if (isResourcePermission(request.permission, 'catalog-entity')) {
      if (request.permission.attributes.action === 'delete') {
        return user?.identity.userEntityRef === 'user:default/admin'
          ? { result: AuthorizeResult.ALLOW }
          : { result: AuthorizeResult.DENY };
      }
    }
    return { result: AuthorizeResult.ALLOW };
  }
}

For most teams: read everything by default, scaffold within own team only, mutate / delete catalog by admins.

Cost Visibility Per Service

Wire FinOps data into the catalog:

# annotations on catalog-info.yaml
metadata:
  annotations:
    finops/cost-tag: 'service=checkout'

Custom Backstage plugin queries Kubecost / Vantage / your billing query API and shows:

  • Monthly $ per service
  • Trend (up/down/flat)
  • Cost per request / per user
  • Links to optimization recommendations

Engineers seeing "your service costs $12k/month" make different decisions than ones who only see latency.

Templates for Real Tools

Don't just template services. Template:

  • GitHub Actions workflow — "add this standard CI to my repo"
  • Helm chart values overlay — "create a new environment for an existing service"
  • Database migration — "add a new migration for the checkout DB"
  • API client — "generate a typed SDK from this OpenAPI"
  • Feature flag — "add a flag with these defaults via Feature Flags"

Every repetitive task that follows a pattern is a template candidate.

Multi-Tenancy and Federated Catalogs

Large org: one Backstage hub, multiple teams contributing:

  • Each team owns templates in their repo
  • Backstage federates: catalog.locations references many template repos
  • Team's templates appear under their group in the scaffolder
  • Permissions ensure a team's scaffolder workflows only touch their resources

The platform team curates the experience (UI consistency, conventions, common plugins). Teams contribute paths specific to their stack.

Self-Service Production Access

A privileged operation done right:

[Engineer needs to investigate prod] →
  → Catalog page for service →
  → "Request prod shell access" button →
  → Form: reason, expected duration →
  → Approval workflow (peer or manager) →
  → On approval, IDP creates short-lived signed cert / kubeconfig →
  → Access expires automatically; all commands logged

This replaces the "DM the SRE for kubectl access" anti-pattern with audited, time-boxed self-service. Tools like Teleport, StrongDM, or homegrown IDP plugins drive this.

Integration with GitOps

The IDP and GitOps work together:

  • Engineer fills scaffolder form → IDP creates repo, manifests, PR into GitOps repo
  • GitOps merges → ArgoCD/Flux deploys
  • ArgoCD status flows back to Backstage → catalog shows live deployment status

The IDP is the intent; GitOps is the enforcement. Don't bypass GitOps with the IDP making direct kubectl apply — that breaks both models.

Anti-Patterns

Bottomless catalog. Every team adds entries, no one curates, the catalog becomes a junkyard. Have a hygiene process: orphans, owners that left the company, services with no deploys in a year — quarantined or deleted quarterly.

Templates that bypass platform standards. A team writes a template that hardcodes their own conventions, then other teams copy it. Approval process for templates; they're part of the platform.

Platform plugins that no one maintains. A plugin shows last deploy time, breaks during an upgrade, no one notices for a month. Have ownership for each plugin; broken plugins teach engineers not to trust the platform.

Portal as a wiki. If engineers come to find documentation that should be in the catalog (and find it stale), they stop trusting the portal. The portal must be the source of truth, not a derivative.

Tools without conventions. Backstage + no policy = "what should I tag my service with?" answered by every team differently. Define conventions; enforce via scorecards.

What's Next

On this page