Service Mesh
Network infrastructure for service-to-service traffic - mTLS, traffic management, observability without changing app code
Service Mesh
A service mesh is dedicated infrastructure that handles service-to-service communication. Instead of every app shipping its own retry library, mTLS handshake, circuit breaker, and tracing code, you push that work into a sidecar proxy (or a node-level proxy) sitting next to your app. Your app keeps talking plain HTTP; the mesh makes it secure, observable, and resilient.
Why Use One
| Without a mesh | With a mesh |
|---|---|
| Every team rewrites retries, timeouts, circuit breakers | Configured once at the platform layer |
| Service-to-service TLS = manual cert management per app | Automatic mTLS for every connection |
| Traffic shifting requires app-level feature flags | Weighted routing as a config primitive |
| Tracing requires every app to instrument | L7 metrics and spans collected by the proxy |
| Cross-language consistency is a soft promise | Same proxy, same behavior, all languages |
The Sidecar Model
┌─────────────────────────────┐ ┌─────────────────────────────┐
Pod → │ App ───► Sidecar proxy │ │ Sidecar proxy ───► App │ ← Pod
│ (Envoy/Linkerd2-px) │ │ (Envoy/Linkerd2-px) │
└─────────────────────────────┘ └─────────────────────────────┘
│ ▲
└────── mTLS + L7 ─────────────┘
│
┌────────▼────────┐
│ Control plane │ policies, certs, telemetry
└─────────────────┘Every pod gets a proxy injected next to it. The app sends a plain HTTP call to auth-service; the sidecar intercepts it, encrypts with mTLS, retries on failure, emits metrics and a trace span, and ships it to the destination's sidecar.
The Players
| Mesh | Proxy | Notes |
|---|---|---|
| Istio | Envoy | Feature-rich, Kubernetes-native, large community. Ambient mode removes the per-pod sidecar (newer). |
| Linkerd | linkerd2-proxy (Rust) | Smaller, faster, simpler ops. CNCF graduated. |
| Consul Connect | Envoy | Multi-runtime (K8s + VMs); tied to Consul service discovery. |
| Cilium Service Mesh | eBPF + Envoy | Bypasses sidecars by using eBPF; tied to the Cilium CNI. |
| Kuma / Open Service Mesh | Envoy | Kuma is Kong's CNCF project; OSM is in maintenance mode. |
Learning Path
1. Getting Started
Install Linkerd on a kind cluster, inject a sidecar, see mTLS and metrics
2. Istio vs Linkerd
Side-by-side comparison, when to pick which, Ambient mode
3. Best Practices
Production patterns - rollout strategy, when not to use, scaling, ops
What a Mesh Buys You (and What It Doesn't)
What a mesh provides natively
| Capability | How |
|---|---|
| mTLS everywhere | Automatic identity + cert rotation |
| L7 retries, timeouts, circuit breaking | Configured per service via CRDs |
| Weighted traffic splitting | Canary deploys without app changes |
| Golden L7 metrics | RPS, error rate, P50/P99 latency per route |
| Distributed tracing | Spans emitted; just need a collector |
| Authorization policies | "service A can call service B's /admin" rules |
What a mesh does not provide
| Capability | What you still need |
|---|---|
| North-south traffic (internet → cluster) | An API Gateway / Ingress controller |
| Application secrets | A secret manager — see Vault |
| Container orchestration | Kubernetes |
| Application logic — business retries, transactions | Your code; meshes do transport retries |
A mesh isn't free. Sidecars consume CPU and memory per pod; the control plane needs care and feeding; debugging gets one layer deeper. A mesh is the right answer when you have many services and the cross-cutting concerns are a real pain. For five services, you probably don't need one yet.
When NOT to Adopt a Mesh
- Fewer than ~10 services where cross-cutting concerns are tractable in libraries.
- No internal mTLS / zero-trust requirement.
- Tight latency budgets where the sidecar hop (1-3 ms) matters.
- Team doesn't have someone willing to own the mesh control plane.
A common middle path: start with a CNI that has built-in network policies (Cilium, Calico) and an Ingress controller for north-south. Reach for a mesh when the platform team and the service count both grow enough to justify it.