Steven's Knowledge

Zero Trust Patterns

ACLs, identity-aware access, MagicDNS, exit nodes, subnet routers, federation, device posture

Zero Trust Patterns

The handful of patterns that take you from "I installed Tailscale" to a real zero-trust network.

ACLs: Authorization Per Connection

The core of zero trust. Every connection is checked against a policy.

Tailscale ACL Example

{
  "groups": {
    "group:engineers": ["alice@example.com", "bob@example.com"],
    "group:contractors": ["external@partner.com"],
    "group:on-call": ["alice@example.com", "carol@example.com"]
  },

  "tagOwners": {
    "tag:prod": ["group:on-call"],
    "tag:staging": ["group:engineers"],
    "tag:dev": ["autogroup:members"]
  },

  "acls": [
    // Engineers can ssh into staging
    { "action": "accept", "src": ["group:engineers"], "dst": ["tag:staging:22"] },

    // On-call can ssh and access on-prod
    { "action": "accept", "src": ["group:on-call"], "dst": ["tag:prod:22,80,443,9090"] },

    // Contractors can only reach the demo box
    { "action": "accept", "src": ["group:contractors"], "dst": ["tag:demo:80,443"] },

    // CI can push to artifact registry
    { "action": "accept", "src": ["tag:ci"], "dst": ["registry:443"] }
  ],

  "ssh": [
    // Only on-call gets SSH as root
    { "action": "accept", "src": ["group:on-call"], "dst": ["tag:prod"], "users": ["root", "ubuntu"] }
  ]
}

Key principles:

  • Groups come from your IdP. When someone leaves the company, removing them from Google Workspace removes their tailnet access.
  • Tags are machine identity. A device with tag:prod is "a prod server"; no human owns it.
  • Default deny. Anything not explicitly allowed is denied.
  • Source + destination + port. Each rule is specific; no blanket "everyone in".

This replaces firewall rules across every server.

Identity-Aware Access (Cloudflare Access Patterns)

For HTTP services, Cloudflare Access policies layer:

# Access policy for /admin/*
- name: "Admin only"
  decision: allow
  include:
    - email_domain: example.com
  require:
    - groups: ["admins"]      # Okta group
    - country: ["US", "CA"]
    - mfa: true

# Default: deny
- name: "Block all others"
  decision: deny
  include:
    - everyone

Common patterns:

NeedPolicy
Internal app, employees onlyemail_domain: example.com
Sensitive app, admins onlygroups: ["admins"] + mfa: true
Vendor portalSpecific emails + country filter
Public health-check, no authbypass: true for /health path

Access logs every authentication attempt — useful audit trail.

Service Auth (Service-to-Service Through Access)

For service-to-service through Access (e.g., a Lambda calling your internal API):

# Create a service token in Cloudflare Access
# Then in the service:
curl https://internal.example.com/api \
  -H "CF-Access-Client-Id: $TOKEN_ID" \
  -H "CF-Access-Client-Secret: $TOKEN_SECRET"

Access validates the token and lets the request through to origin.

MagicDNS and Stable Names

Tailscale's MagicDNS gives every device a stable name:

laptop.tailb12c34.ts.net          # full
laptop                             # short, within your tailnet

ssh laptop, curl http://my-server:8080, psql -h db-replica — they all work without configuring DNS.

Custom DNS is supported too — push your internal *.corp.example.com DNS over the tailnet.

Subnet Routers (Mesh + Existing Networks)

When you can't install Tailscale on every machine in a network (legacy on-prem, cloud-managed services), use a subnet router:

# On a server inside the legacy network
sudo tailscale up --advertise-routes=10.0.0.0/16,192.168.1.0/24
# Approve in admin UI

Now everyone in your tailnet can reach the subnet via that node. Useful for:

  • AWS / GCP VPC — drop a Tailscale subnet router in the VPC; engineers reach RDS, internal load balancers, etc.
  • Office network — laptop in a coffee shop reaches the office printer.
  • Legacy data center — old hardware nobody wants to touch becomes mesh-addressable.

Exit Nodes (Egress IP Control)

Sometimes you need traffic to leave the internet from a specific IP:

  • A third-party API has IP allowlisting.
  • You need to test from a specific country.
  • A corporate proxy is the only outbound path.
# Designate a server as an exit node
sudo tailscale up --advertise-exit-node

# On your client, route traffic through it
tailscale up --exit-node=<server-name>

All your client's traffic now leaves the internet from that server's IP. Reverse-proxy substitute for "I need a static egress IP."

Service Tokens vs OAuth Clients

For automation:

ApproachWhen
Tailscale OAuth clientA service that needs to register devices, manage ACLs, automate via the API
Tailscale auth keysA device that boots and joins the tailnet automatically (pre-authorized)
Cloudflare Access service tokenA service-to-service caller through an Access-protected app

Auth keys for ephemeral CI runners:

# Pre-authorize: get a key from the admin UI, then in the CI runner image
sudo tailscale up --authkey=tskey-... --ephemeral

--ephemeral flag means the device deregisters when it disconnects — perfect for short-lived CI workers.

Device Posture

Some zero-trust products evaluate the device's security state at every request:

  • Is the disk encrypted?
  • Is OS up to date?
  • Is the corporate MDM enrolled?
  • Is antivirus running and current?

If posture fails, access is denied — even from an authenticated user. Tailscale's Device Posture, Cloudflare WARP + device posture, Twingate all do this.

For B2B SaaS targeting enterprises, this is increasingly table stakes.

Federation: Connecting Multiple Tailnets

Two organizations need to share a few resources:

  • Each runs its own tailnet.
  • One side advertises a subnet specifically for sharing.
  • The other side accepts that subnet with restricted ACLs.
  • Or: use Tailscale Federation (newer) for direct cross-tailnet access.

For ad-hoc sharing, exposing a service via Cloudflare Tunnel + Access (with the partner's email allowlisted) is often simpler than mesh-level federation.

Logging and Audit

For compliance and post-incident:

SourceWhat it tells you
Tailscale device logsConnection attempts, ACL decisions
Cloudflare Access logsWho authenticated to which app, when, from where
Tailscale SSH session recordingFull transcript of SSH sessions (when enabled)
Service logsApplication-level user actions

Ship these to a SIEM (ELK or commercial). Required for SOC 2 / ISO 27001.

When Mesh Hits Limits

A few cases where mesh VPN doesn't cleanly apply:

  • Site-to-site between two networks you don't fully control — sometimes a direct WireGuard or IPsec tunnel is simpler than figuring out federation.
  • Hundreds of devices in tight latency budgets — direct paths help, but worst-case path matters.
  • Compliance environments forbidding SaaS coordination — use Headscale (self-hosted control plane) or plain WireGuard.
  • Extreme scale (10,000+ devices) — Tailscale handles it but pricing/ops shift.

For 90% of teams, the mesh model is overwhelmingly the right choice.

Combining With Service Mesh

Two layers, distinct concerns:

  • Tailscale / zero trust: humans → services, services → external endpoints, cross-cluster.
  • Service mesh: services ↔ services inside a single cluster (mTLS, retries, routing).

Don't try to make either do the other's job. Use both.

What's Next

You can build identity-aware networking. Best Practices covers production operation — key management, observability, scaling, security hardening.

On this page