Network infrastructure for service-to-service traffic - mTLS, traffic management, observability without changing app code

Service Mesh

A service mesh is dedicated infrastructure that handles service-to-service communication. Instead of every app shipping its own retry library, mTLS handshake, circuit breaker, and tracing code, you push that work into a sidecar proxy (or a node-level proxy) sitting next to your app. Your app keeps talking plain HTTP; the mesh makes it secure, observable, and resilient.

Why Use One

Without a mesh	With a mesh
Every team rewrites retries, timeouts, circuit breakers	Configured once at the platform layer
Service-to-service TLS = manual cert management per app	Automatic mTLS for every connection
Traffic shifting requires app-level feature flags	Weighted routing as a config primitive
Tracing requires every app to instrument	L7 metrics and spans collected by the proxy
Cross-language consistency is a soft promise	Same proxy, same behavior, all languages

The Sidecar Model

        ┌─────────────────────────────┐  ┌─────────────────────────────┐
Pod →   │   App   ───►  Sidecar proxy │  │ Sidecar proxy ───►  App     │   ← Pod
        │         (Envoy/Linkerd2-px) │  │ (Envoy/Linkerd2-px)         │
        └─────────────────────────────┘  └─────────────────────────────┘
                          │                              ▲
                          └────── mTLS + L7 ─────────────┘
                                       │
                              ┌────────▼────────┐
                              │  Control plane  │  policies, certs, telemetry
                              └─────────────────┘

Every pod gets a proxy injected next to it. The app sends a plain HTTP call to auth-service; the sidecar intercepts it, encrypts with mTLS, retries on failure, emits metrics and a trace span, and ships it to the destination's sidecar.

The Players

Mesh	Proxy	Notes
Istio	Envoy	Feature-rich, Kubernetes-native, large community. Ambient mode removes the per-pod sidecar (newer).
Linkerd	linkerd2-proxy (Rust)	Smaller, faster, simpler ops. CNCF graduated.
Consul Connect	Envoy	Multi-runtime (K8s + VMs); tied to Consul service discovery.
Cilium Service Mesh	eBPF + Envoy	Bypasses sidecars by using eBPF; tied to the Cilium CNI.
Kuma / Open Service Mesh	Envoy	Kuma is Kong's CNCF project; OSM is in maintenance mode.

Learning Path

1. Getting Started

Install Linkerd on a kind cluster, inject a sidecar, see mTLS and metrics

2. Istio vs Linkerd

Side-by-side comparison, when to pick which, Ambient mode

3. Best Practices

Production patterns - rollout strategy, when not to use, scaling, ops

What a Mesh Buys You (and What It Doesn't)

What a mesh provides natively

Capability	How
mTLS everywhere	Automatic identity + cert rotation
L7 retries, timeouts, circuit breaking	Configured per service via CRDs
Weighted traffic splitting	Canary deploys without app changes
Golden L7 metrics	RPS, error rate, P50/P99 latency per route
Distributed tracing	Spans emitted; just need a collector
Authorization policies	"service A can call service B's `/admin`" rules

What a mesh does not provide

Capability	What you still need
North-south traffic (internet → cluster)	An API Gateway / Ingress controller
Application secrets	A secret manager — see Vault
Container orchestration	Kubernetes
Application logic — business retries, transactions	Your code; meshes do transport retries

A mesh isn't free. Sidecars consume CPU and memory per pod; the control plane needs care and feeding; debugging gets one layer deeper. A mesh is the right answer when you have many services and the cross-cutting concerns are a real pain. For five services, you probably don't need one yet.

When NOT to Adopt a Mesh

Fewer than ~10 services where cross-cutting concerns are tractable in libraries.
No internal mTLS / zero-trust requirement.
Tight latency budgets where the sidecar hop (1-3 ms) matters.
Team doesn't have someone willing to own the mesh control plane.

A common middle path: start with a CNI that has built-in network policies (Cilium, Calico) and an Ingress controller for north-south. Reach for a mesh when the platform team and the service count both grow enough to justify it.

Service Mesh

1. Getting Started

2. Istio vs Linkerd

3. Best Practices

On this page