Steven's Knowledge

Idempotency & Webhooks

Idempotency keys, exactly-once vs at-least-once, retries, and webhook design — delivery, HMAC signing, replay protection, and consumer-side idempotency

Idempotency & Webhooks

The network lies. A request that times out may have succeeded. A webhook you "delivered" may never have arrived. A response that never came back might have charged the customer twice. Distributed systems do not give you clean success and failure — they give you ambiguity, and your job is to make that ambiguity safe.

This page is about two sides of the same coin. Idempotency lets a caller retry without fear of duplicating work. Webhooks are you being the caller into someone else's system — with all the retry, signing, and ordering problems that come with it.

The Core Problem: At-Least-Once Delivery

In practice you almost never get exactly-once delivery. You get one of these:

GuaranteeWhat it meansReality
At-most-onceEach message delivered 0 or 1 timesFire-and-forget; messages can be lost
At-least-onceEach message delivered 1 or more timesThe default for any system with retries — duplicates happen
Exactly-onceEach message has exactly one effectA property you build, not a transport you buy

"Exactly-once delivery" is mostly a myth. What you actually build is at-least-once delivery + idempotent processing = exactly-once effect. Internalize this: you stop trying to prevent duplicates and start making them harmless.

Sender retries on timeout  ──►  duplicates are inevitable
Receiver dedupes on a key  ──►  duplicates become harmless
                                 = exactly-once effect

Idempotency Keys

An idempotent operation produces the same result whether it runs once or five times. GET, PUT, and DELETE are idempotent by HTTP semantics. POST is not — and that is the dangerous one, because POST /charges creating a duplicate charge is a real customer-facing incident.

The fix: let the client supply a unique key, and the server guarantee that a key is processed at most once.

POST /charges HTTP/1.1
Idempotency-Key: 9f8a2c1e-4b6d-4e3a-8c1f-2d5e7a9b0c3d
Content-Type: application/json

{ "amount": 4999, "currency": "usd", "customer": "cus_123" }

The server stores the key, the request fingerprint, and the eventual response. A retry with the same key returns the stored response instead of re-executing.

async function handleCharge(req: Request): Promise<Response> {
  const key = req.headers['idempotency-key'];
  if (!key) return badRequest('Idempotency-Key header required');

  const fingerprint = hash(req.body); // detect key reuse with different payloads

  // Atomic insert-or-fetch on (key)
  const existing = await db.idempotency.findByKey(key);
  if (existing) {
    if (existing.fingerprint !== fingerprint) {
      return conflict('Idempotency-Key reused with a different request body');
    }
    if (existing.status === 'completed') {
      return storedResponse(existing); // replay the original result
    }
    // status === 'in_progress' → a retry arrived mid-flight
    return conflict('A request with this key is still being processed');
  }

  await db.idempotency.insert({ key, fingerprint, status: 'in_progress' });

  const result = await chargeCustomer(req.body); // the real work

  await db.idempotency.complete(key, { status: 'completed', response: result });
  return ok(result);
}

Design Rules

  • Scope the key per endpoint (or per account), never global. Key abc on /charges and abc on /refunds are different operations.
  • Fingerprint the request body. If the same key arrives with a different payload, that is a client bug — return 409, do not silently replay.
  • Persist the response, not just the fact of completion. The whole point is that the retry gets the same answer, including the resource ID you generated.
  • Expire keys. Keep them 24–72 hours. Long enough to cover all client retries, short enough that the table does not grow forever.
  • The in-progress state matters. Two concurrent retries can race. Insert the key before doing the work, in the same transaction boundary if you can, so the second request sees in_progress and backs off.

Where to Store Keys

StoreFitNotes
Primary DB rowStrong consistency with the work itselfBest when the operation already writes to that DB — one transaction
Redis with TTLFast, auto-expiryRisk: key survives but the work failed, or vice versa — needs care
Dedicated tableClean separationThe pragmatic default for payment-grade idempotency

The subtle trap with a separate store: if you record the key in Redis but the DB write fails, you have "remembered" an operation that never happened. Tie the key's completion to the same commit as the work whenever the operation is a single-database write.

Webhooks: You Are Now the Caller

A webhook is an HTTP callback: when something happens in your system, you POST to a URL the consumer registered. It is the inverse of polling. Everything above about at-least-once applies — except now you are the unreliable sender, and someone else's flaky endpoint is the receiver.

Anatomy of a Good Webhook Payload

{
  "id": "evt_01HXYZ...",
  "type": "invoice.paid",
  "created_at": "2026-05-28T10:00:00Z",
  "api_version": "2026-01-15",
  "data": {
    "object": "invoice",
    "id": "inv_123",
    "amount_paid": 4999
  }
}
  • A stable, unique event id. This is the consumer's idempotency key (see below). Make it immutable across retries of the same event.
  • A type the consumer can route on without parsing data.
  • A versioned schema. Webhook payloads are a public API. Breaking their shape silently breaks every consumer.
  • Thin vs. fat payloads. Thin payloads send just IDs and force the consumer to call back for details (avoids stale/oversized data, but adds a round trip and auth surface). Fat payloads embed the data (fewer calls, but can be stale by delivery time and leak more in logs). Default to thin for sensitive data, fat for high-volume low-sensitivity events.

Signing: HMAC Verification

The consumer must be able to prove the webhook came from you and was not tampered with. Sign the raw request body with a shared secret using HMAC-SHA256.

// Sender: produce the signature header
function signWebhook(rawBody: string, secret: string): string {
  const timestamp = Math.floor(Date.now() / 1000);
  const signedPayload = `${timestamp}.${rawBody}`;
  const signature = crypto
    .createHmac('sha256', secret)
    .update(signedPayload)
    .digest('hex');
  return `t=${timestamp},v1=${signature}`;
}
// Consumer: verify before trusting anything
function verifyWebhook(rawBody: string, header: string, secret: string): boolean {
  const parts = Object.fromEntries(header.split(',').map(p => p.split('=')));
  const timestamp = Number(parts.t);

  // Replay protection: reject anything older than 5 minutes
  if (Math.abs(Date.now() / 1000 - timestamp) > 300) return false;

  const expected = crypto
    .createHmac('sha256', secret)
    .update(`${timestamp}.${rawBody}`)
    .digest('hex');

  // Constant-time comparison defeats timing attacks
  return crypto.timingSafeEqual(
    Buffer.from(expected),
    Buffer.from(parts.v1),
  );
}

Non-negotiables:

  • Sign the raw bytes, before any JSON parsing or re-serialization. Parse-then-reserialize changes whitespace and key order, and the signature will never match. Capture the raw body in middleware.
  • Include a timestamp in the signed payload and reject old ones — this is your replay protection.
  • Use constant-time comparison (timingSafeEqual), never ===.
  • Support secret rotation. Send two signatures (v1 with the old secret, v1 with the new) during a rotation window so consumers never see a gap.

Delivery and Retries with Backoff

Consumers go down. Your delivery system must retry — with exponential backoff and jitter, exactly as in the Resilience patterns — and eventually give up.

const RETRY_SCHEDULE = [
  0,          // immediate
  60,         // 1 min
  300,        // 5 min
  1_800,      // 30 min
  7_200,      // 2 hr
  36_000,     // 10 hr
  86_400,     // 24 hr → then dead-letter
];

async function deliver(event: WebhookEvent, attempt = 0): Promise<void> {
  const res = await fetchWithTimeout(event.url, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Webhook-Signature': signWebhook(event.rawBody, event.secret),
      'Webhook-Id': event.id,
    },
    body: event.rawBody,
    timeoutMs: 10_000,
  });

  // Only 2xx counts as accepted. Everything else retries.
  if (res.status >= 200 && res.status < 300) {
    return markDelivered(event.id);
  }

  if (attempt + 1 >= RETRY_SCHEDULE.length) {
    return moveToDeadLetter(event.id); // surface in the consumer's dashboard
  }

  await scheduleRetry(event, attempt + 1, RETRY_SCHEDULE[attempt + 1]);
}
  • Treat only 2xx as success. A 3xx redirect, a 4xx, or a slow 5xx all mean "retry" (except a persistent 4xx, which may mean the endpoint is permanently wrong — log it loudly).
  • Short timeout (5–10s). The consumer should acknowledge fast and process asynchronously, not do heavy work inside the request.
  • Dead-letter after the schedule is exhausted. Give consumers a UI to inspect and manually replay failed events. Never drop silently.
  • Cap concurrency per consumer endpoint with a bulkhead so one slow consumer does not starve delivery to everyone else.

Ordering

Webhooks are not ordered by default. Retries, parallel delivery, and clock skew mean invoice.paid can arrive before invoice.created. Do not assume order.

Sent:      created → updated → paid
Delivered: created → paid → updated   (updated was retried after a 503)

Strategies, from cheap to expensive:

  • Make consumers order-independent. Each event carries the full current state, or enough to reconcile. This is the best default.
  • Sequence numbers. Include a monotonically increasing sequence per resource; the consumer ignores any event with a sequence lower than what it has already applied.
  • Per-key serialized delivery. Deliver events for the same resource one at a time, waiting for ack before sending the next. Strong ordering, but slow and operationally heavy — only do this if consumers genuinely cannot reconcile.

Consumer-Side Idempotency

Because delivery is at-least-once, the consumer must dedupe. This is the mirror image of the idempotency-key section — the webhook's event id is the key.

async function handleWebhook(req: Request): Promise<Response> {
  if (!verifyWebhook(req.rawBody, req.headers['webhook-signature'], SECRET)) {
    return unauthorized();
  }

  const event = JSON.parse(req.rawBody);

  // Dedupe on the event id. INSERT ... ON CONFLICT DO NOTHING is atomic.
  const inserted = await db.processedEvents.insertIfNew(event.id);
  if (!inserted) {
    return ok(); // already handled — ack so the sender stops retrying
  }

  await processEvent(event);
  return ok(); // 2xx, fast
}

Notice the consumer returns 200 even for a duplicate. Acking the duplicate is correct — it tells the sender to stop retrying. Returning an error on a duplicate would cause infinite redelivery.

Putting It Together

The full loop, sender and receiver, each holding up their half:

SENDER (you emit webhooks)            RECEIVER (you consume webhooks)
─────────────────────────            ───────────────────────────────
stable event id                 ┐    verify HMAC + timestamp
HMAC-sign raw body              │    reject replays (> 5 min old)
POST with short timeout         │    INSERT event_id (dedupe)
retry 2xx-only w/ backoff       ├──► process exactly once
dead-letter after N tries       │    ack 2xx fast (even on dupes)
per-consumer bulkhead           ┘    process heavy work async

Checklist

For idempotent endpoints:

  • Mutating POST endpoints accept an Idempotency-Key.
  • Keys are scoped per endpoint/account, fingerprinted against the body, and expired.
  • The key's completion commits atomically with the work it represents.
  • Concurrent retries with the same key are serialized (in-progress state).

For webhooks you send:

  • Payloads carry a stable, unique event id and a versioned schema.
  • Bodies are HMAC-signed over the raw bytes, with a timestamp for replay protection.
  • Delivery retries with exponential backoff and dead-letters after exhaustion.
  • Secret rotation is supported without a delivery gap.

For webhooks you consume:

  • Verify the signature on the raw body before parsing.
  • Reject stale timestamps.
  • Dedupe on the event id and ack duplicates with 2xx.
  • Acknowledge fast; do heavy processing asynchronously.

On this page