Steven's Knowledge

Service Mesh

Network infrastructure for service-to-service traffic - mTLS, traffic management, observability without changing app code

Service Mesh

A service mesh is dedicated infrastructure that handles service-to-service communication. Instead of every app shipping its own retry library, mTLS handshake, circuit breaker, and tracing code, you push that work into a sidecar proxy (or a node-level proxy) sitting next to your app. Your app keeps talking plain HTTP; the mesh makes it secure, observable, and resilient.

Why Use One

Without a meshWith a mesh
Every team rewrites retries, timeouts, circuit breakersConfigured once at the platform layer
Service-to-service TLS = manual cert management per appAutomatic mTLS for every connection
Traffic shifting requires app-level feature flagsWeighted routing as a config primitive
Tracing requires every app to instrumentL7 metrics and spans collected by the proxy
Cross-language consistency is a soft promiseSame proxy, same behavior, all languages

The Sidecar Model

        ┌─────────────────────────────┐  ┌─────────────────────────────┐
Pod →   │   App   ───►  Sidecar proxy │  │ Sidecar proxy ───►  App     │   ← Pod
        │         (Envoy/Linkerd2-px) │  │ (Envoy/Linkerd2-px)         │
        └─────────────────────────────┘  └─────────────────────────────┘
                          │                              ▲
                          └────── mTLS + L7 ─────────────┘

                              ┌────────▼────────┐
                              │  Control plane  │  policies, certs, telemetry
                              └─────────────────┘

Every pod gets a proxy injected next to it. The app sends a plain HTTP call to auth-service; the sidecar intercepts it, encrypts with mTLS, retries on failure, emits metrics and a trace span, and ships it to the destination's sidecar.

The Players

MeshProxyNotes
IstioEnvoyFeature-rich, Kubernetes-native, large community. Ambient mode removes the per-pod sidecar (newer).
Linkerdlinkerd2-proxy (Rust)Smaller, faster, simpler ops. CNCF graduated.
Consul ConnectEnvoyMulti-runtime (K8s + VMs); tied to Consul service discovery.
Cilium Service MesheBPF + EnvoyBypasses sidecars by using eBPF; tied to the Cilium CNI.
Kuma / Open Service MeshEnvoyKuma is Kong's CNCF project; OSM is in maintenance mode.

Learning Path

What a Mesh Buys You (and What It Doesn't)

What a mesh provides natively

CapabilityHow
mTLS everywhereAutomatic identity + cert rotation
L7 retries, timeouts, circuit breakingConfigured per service via CRDs
Weighted traffic splittingCanary deploys without app changes
Golden L7 metricsRPS, error rate, P50/P99 latency per route
Distributed tracingSpans emitted; just need a collector
Authorization policies"service A can call service B's /admin" rules

What a mesh does not provide

CapabilityWhat you still need
North-south traffic (internet → cluster)An API Gateway / Ingress controller
Application secretsA secret manager — see Vault
Container orchestrationKubernetes
Application logic — business retries, transactionsYour code; meshes do transport retries

A mesh isn't free. Sidecars consume CPU and memory per pod; the control plane needs care and feeding; debugging gets one layer deeper. A mesh is the right answer when you have many services and the cross-cutting concerns are a real pain. For five services, you probably don't need one yet.

When NOT to Adopt a Mesh

  • Fewer than ~10 services where cross-cutting concerns are tractable in libraries.
  • No internal mTLS / zero-trust requirement.
  • Tight latency budgets where the sidecar hop (1-3 ms) matters.
  • Team doesn't have someone willing to own the mesh control plane.

A common middle path: start with a CNI that has built-in network policies (Cilium, Calico) and an Ingress controller for north-south. Reach for a mesh when the platform team and the service count both grow enough to justify it.

On this page