Idempotency & Webhooks
Idempotency keys, exactly-once vs at-least-once, retries, and webhook design — delivery, HMAC signing, replay protection, and consumer-side idempotency
Idempotency & Webhooks
The network lies. A request that times out may have succeeded. A webhook you "delivered" may never have arrived. A response that never came back might have charged the customer twice. Distributed systems do not give you clean success and failure — they give you ambiguity, and your job is to make that ambiguity safe.
This page is about two sides of the same coin. Idempotency lets a caller retry without fear of duplicating work. Webhooks are you being the caller into someone else's system — with all the retry, signing, and ordering problems that come with it.
The Core Problem: At-Least-Once Delivery
In practice you almost never get exactly-once delivery. You get one of these:
| Guarantee | What it means | Reality |
|---|---|---|
| At-most-once | Each message delivered 0 or 1 times | Fire-and-forget; messages can be lost |
| At-least-once | Each message delivered 1 or more times | The default for any system with retries — duplicates happen |
| Exactly-once | Each message has exactly one effect | A property you build, not a transport you buy |
"Exactly-once delivery" is mostly a myth. What you actually build is at-least-once delivery + idempotent processing = exactly-once effect. Internalize this: you stop trying to prevent duplicates and start making them harmless.
Sender retries on timeout ──► duplicates are inevitable
Receiver dedupes on a key ──► duplicates become harmless
= exactly-once effectIdempotency Keys
An idempotent operation produces the same result whether it runs once or five times. GET, PUT, and DELETE are idempotent by HTTP semantics. POST is not — and that is the dangerous one, because POST /charges creating a duplicate charge is a real customer-facing incident.
The fix: let the client supply a unique key, and the server guarantee that a key is processed at most once.
POST /charges HTTP/1.1
Idempotency-Key: 9f8a2c1e-4b6d-4e3a-8c1f-2d5e7a9b0c3d
Content-Type: application/json
{ "amount": 4999, "currency": "usd", "customer": "cus_123" }The server stores the key, the request fingerprint, and the eventual response. A retry with the same key returns the stored response instead of re-executing.
async function handleCharge(req: Request): Promise<Response> {
const key = req.headers['idempotency-key'];
if (!key) return badRequest('Idempotency-Key header required');
const fingerprint = hash(req.body); // detect key reuse with different payloads
// Atomic insert-or-fetch on (key)
const existing = await db.idempotency.findByKey(key);
if (existing) {
if (existing.fingerprint !== fingerprint) {
return conflict('Idempotency-Key reused with a different request body');
}
if (existing.status === 'completed') {
return storedResponse(existing); // replay the original result
}
// status === 'in_progress' → a retry arrived mid-flight
return conflict('A request with this key is still being processed');
}
await db.idempotency.insert({ key, fingerprint, status: 'in_progress' });
const result = await chargeCustomer(req.body); // the real work
await db.idempotency.complete(key, { status: 'completed', response: result });
return ok(result);
}Design Rules
- Scope the key per endpoint (or per account), never global. Key
abcon/chargesandabcon/refundsare different operations. - Fingerprint the request body. If the same key arrives with a different payload, that is a client bug — return
409, do not silently replay. - Persist the response, not just the fact of completion. The whole point is that the retry gets the same answer, including the resource ID you generated.
- Expire keys. Keep them 24–72 hours. Long enough to cover all client retries, short enough that the table does not grow forever.
- The in-progress state matters. Two concurrent retries can race. Insert the key before doing the work, in the same transaction boundary if you can, so the second request sees
in_progressand backs off.
Where to Store Keys
| Store | Fit | Notes |
|---|---|---|
| Primary DB row | Strong consistency with the work itself | Best when the operation already writes to that DB — one transaction |
| Redis with TTL | Fast, auto-expiry | Risk: key survives but the work failed, or vice versa — needs care |
| Dedicated table | Clean separation | The pragmatic default for payment-grade idempotency |
The subtle trap with a separate store: if you record the key in Redis but the DB write fails, you have "remembered" an operation that never happened. Tie the key's completion to the same commit as the work whenever the operation is a single-database write.
Webhooks: You Are Now the Caller
A webhook is an HTTP callback: when something happens in your system, you POST to a URL the consumer registered. It is the inverse of polling. Everything above about at-least-once applies — except now you are the unreliable sender, and someone else's flaky endpoint is the receiver.
Anatomy of a Good Webhook Payload
{
"id": "evt_01HXYZ...",
"type": "invoice.paid",
"created_at": "2026-05-28T10:00:00Z",
"api_version": "2026-01-15",
"data": {
"object": "invoice",
"id": "inv_123",
"amount_paid": 4999
}
}- A stable, unique event
id. This is the consumer's idempotency key (see below). Make it immutable across retries of the same event. - A
typethe consumer can route on without parsingdata. - A versioned schema. Webhook payloads are a public API. Breaking their shape silently breaks every consumer.
- Thin vs. fat payloads. Thin payloads send just IDs and force the consumer to call back for details (avoids stale/oversized data, but adds a round trip and auth surface). Fat payloads embed the data (fewer calls, but can be stale by delivery time and leak more in logs). Default to thin for sensitive data, fat for high-volume low-sensitivity events.
Signing: HMAC Verification
The consumer must be able to prove the webhook came from you and was not tampered with. Sign the raw request body with a shared secret using HMAC-SHA256.
// Sender: produce the signature header
function signWebhook(rawBody: string, secret: string): string {
const timestamp = Math.floor(Date.now() / 1000);
const signedPayload = `${timestamp}.${rawBody}`;
const signature = crypto
.createHmac('sha256', secret)
.update(signedPayload)
.digest('hex');
return `t=${timestamp},v1=${signature}`;
}// Consumer: verify before trusting anything
function verifyWebhook(rawBody: string, header: string, secret: string): boolean {
const parts = Object.fromEntries(header.split(',').map(p => p.split('=')));
const timestamp = Number(parts.t);
// Replay protection: reject anything older than 5 minutes
if (Math.abs(Date.now() / 1000 - timestamp) > 300) return false;
const expected = crypto
.createHmac('sha256', secret)
.update(`${timestamp}.${rawBody}`)
.digest('hex');
// Constant-time comparison defeats timing attacks
return crypto.timingSafeEqual(
Buffer.from(expected),
Buffer.from(parts.v1),
);
}Non-negotiables:
- Sign the raw bytes, before any JSON parsing or re-serialization. Parse-then-reserialize changes whitespace and key order, and the signature will never match. Capture the raw body in middleware.
- Include a timestamp in the signed payload and reject old ones — this is your replay protection.
- Use constant-time comparison (
timingSafeEqual), never===. - Support secret rotation. Send two signatures (
v1with the old secret,v1with the new) during a rotation window so consumers never see a gap.
Delivery and Retries with Backoff
Consumers go down. Your delivery system must retry — with exponential backoff and jitter, exactly as in the Resilience patterns — and eventually give up.
const RETRY_SCHEDULE = [
0, // immediate
60, // 1 min
300, // 5 min
1_800, // 30 min
7_200, // 2 hr
36_000, // 10 hr
86_400, // 24 hr → then dead-letter
];
async function deliver(event: WebhookEvent, attempt = 0): Promise<void> {
const res = await fetchWithTimeout(event.url, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Webhook-Signature': signWebhook(event.rawBody, event.secret),
'Webhook-Id': event.id,
},
body: event.rawBody,
timeoutMs: 10_000,
});
// Only 2xx counts as accepted. Everything else retries.
if (res.status >= 200 && res.status < 300) {
return markDelivered(event.id);
}
if (attempt + 1 >= RETRY_SCHEDULE.length) {
return moveToDeadLetter(event.id); // surface in the consumer's dashboard
}
await scheduleRetry(event, attempt + 1, RETRY_SCHEDULE[attempt + 1]);
}- Treat only
2xxas success. A3xxredirect, a4xx, or a slow5xxall mean "retry" (except a persistent4xx, which may mean the endpoint is permanently wrong — log it loudly). - Short timeout (5–10s). The consumer should acknowledge fast and process asynchronously, not do heavy work inside the request.
- Dead-letter after the schedule is exhausted. Give consumers a UI to inspect and manually replay failed events. Never drop silently.
- Cap concurrency per consumer endpoint with a bulkhead so one slow consumer does not starve delivery to everyone else.
Ordering
Webhooks are not ordered by default. Retries, parallel delivery, and clock skew mean invoice.paid can arrive before invoice.created. Do not assume order.
Sent: created → updated → paid
Delivered: created → paid → updated (updated was retried after a 503)Strategies, from cheap to expensive:
- Make consumers order-independent. Each event carries the full current state, or enough to reconcile. This is the best default.
- Sequence numbers. Include a monotonically increasing
sequenceper resource; the consumer ignores any event with a sequence lower than what it has already applied. - Per-key serialized delivery. Deliver events for the same resource one at a time, waiting for ack before sending the next. Strong ordering, but slow and operationally heavy — only do this if consumers genuinely cannot reconcile.
Consumer-Side Idempotency
Because delivery is at-least-once, the consumer must dedupe. This is the mirror image of the idempotency-key section — the webhook's event id is the key.
async function handleWebhook(req: Request): Promise<Response> {
if (!verifyWebhook(req.rawBody, req.headers['webhook-signature'], SECRET)) {
return unauthorized();
}
const event = JSON.parse(req.rawBody);
// Dedupe on the event id. INSERT ... ON CONFLICT DO NOTHING is atomic.
const inserted = await db.processedEvents.insertIfNew(event.id);
if (!inserted) {
return ok(); // already handled — ack so the sender stops retrying
}
await processEvent(event);
return ok(); // 2xx, fast
}Notice the consumer returns 200 even for a duplicate. Acking the duplicate is correct — it tells the sender to stop retrying. Returning an error on a duplicate would cause infinite redelivery.
Putting It Together
The full loop, sender and receiver, each holding up their half:
SENDER (you emit webhooks) RECEIVER (you consume webhooks)
───────────────────────── ───────────────────────────────
stable event id ┐ verify HMAC + timestamp
HMAC-sign raw body │ reject replays (> 5 min old)
POST with short timeout │ INSERT event_id (dedupe)
retry 2xx-only w/ backoff ├──► process exactly once
dead-letter after N tries │ ack 2xx fast (even on dupes)
per-consumer bulkhead ┘ process heavy work asyncChecklist
For idempotent endpoints:
- Mutating
POSTendpoints accept anIdempotency-Key. - Keys are scoped per endpoint/account, fingerprinted against the body, and expired.
- The key's completion commits atomically with the work it represents.
- Concurrent retries with the same key are serialized (in-progress state).
For webhooks you send:
- Payloads carry a stable, unique event
idand a versioned schema. - Bodies are HMAC-signed over the raw bytes, with a timestamp for replay protection.
- Delivery retries with exponential backoff and dead-letters after exhaustion.
- Secret rotation is supported without a delivery gap.
For webhooks you consume:
- Verify the signature on the raw body before parsing.
- Reject stale timestamps.
- Dedupe on the event
idand ack duplicates with2xx. - Acknowledge fast; do heavy processing asynchronously.