DH
13 min read

Idempotent Webhook Processing: Deduplication and Retry Handling in FastAPI and Next.js

Production-ready webhook handling with deduplication and retry logic in FastAPI and Next.js. Prevent duplicate charges and handle partial failures robustly.

webhooksfastapinextjs

Idempotent Webhook Processing: Deduplication and Retry Handling in FastAPI and Next.js

Webhooks are deceptively simple until they bite you. A payment provider fires an event, your server hiccups, they retry, and suddenly you've charged a customer twice. I've seen this pattern cause real damage — not hypothetical damage, real "refund queue on a Monday morning" damage. After 25 years of building SaaS products, idempotent webhook processing is one of those patterns I now reach for before anything else.

This tutorial covers a production-ready approach to deduplication and retry handling, with concrete code for both a FastAPI backend and a Next.js API route. It also covers what happens when processing partially fails, how to build a robust background retry layer, and the per-provider specifics you need to configure correctly.


Why Webhooks Need Idempotency

HTTP is unreliable. Every major provider delivers events on an at-least-once basis — never exactly-once. Stripe's own documentation explicitly states that endpoints "might occasionally receive the same event more than once." AWS SQS Standard queues make the same guarantee. Treating delivery as exactly-once is a category error; build for duplicates by default.

The most common trigger is not a dramatic network failure. It's your handler finishing the work but responding a few milliseconds too late, so the provider's timeout fires, it assumes failure, and retries an operation that already succeeded. The retry loop then processes a completed event a second time.

Idempotency means processing the same event multiple times produces exactly the same result as processing it once. The mechanism is straightforward: record a unique identifier for each event before you act on it, and reject duplicates.

Provider Retry Schedules

Different providers have very different retry windows. Configure your deduplication TTLs and dead-letter queues against the specific provider you're integrating — not a single global number.

ProviderRetry budgetBackoff strategyUnique ID header/fieldNotes
StripeUp to 3 days (live mode)Exponential backoffevent.id in JSON bodyIncludes Stripe-Signature with timestamp for replay protection
Shopify8 attempts over ~4 hoursFixed intervalsX-Shopify-Webhook-Id headerSubscription may be auto-deleted after 8 consecutive failures
Svix~8 attempts over ~1 dayExponentialwebhook-id headerUsed by many platforms as their webhook infrastructure
GitHubUp to 3 retries~1 min intervalsX-GitHub-Delivery headerDelivery ID is consistent across retries for the same event

Key implication: Your deduplication store TTL must outlive the provider's full retry window. For Stripe, that means at minimum 3 days. For Shopify, 4–5 hours is sufficient. Using a hard-coded 24-hour TTL is fine as a default but will miss Stripe's late retries unless you bump it.


The Core Pattern

  1. Extract a unique event ID from the incoming payload or headers.
  2. Verify the signature first — before touching the database.
  3. Attempt to insert that ID into a deduplication store (database or cache).
  4. If the insert succeeds, process the event.
  5. If the insert fails (duplicate key), return 200 immediately — do not reprocess.

Returning 200 on a duplicate is intentional. You're telling the provider "I've handled this," which stops the retry loop. Returning 4xx or 5xx on a duplicate will cause the provider to retry indefinitely.


FastAPI Implementation

Setting Up the Deduplication Table

Using PostgreSQL:

CREATE TABLE webhook_events (
event_id TEXT PRIMARY KEY,
source TEXT NOT NULL,
received_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
status TEXT NOT NULL DEFAULT 'pending', -- pending | processed | failed
payload JSONB
);

-- Index for the background retry worker
CREATE INDEX webhook_events_status_idx ON webhook_events (status, received_at)
WHERE status IN ('pending', 'failed');

The PRIMARY KEY constraint does the heavy lifting — attempting to insert a duplicate event_id raises a unique violation. The status column supports the partial-failure recovery pattern described below.

The FastAPI Endpoint

from fastapi import FastAPI, Request, HTTPException, Depends
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.exc import IntegrityError
from sqlalchemy import text
import hashlib, hmac, json

app = FastAPI()

async def verify_signature(raw_body: bytes, signature: str, secret: str) -> bool:
expected = hmac.new(
secret.encode(), raw_body, hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, signature)

@app.post("/webhooks/payment")
async def handle_payment_webhook(
request: Request,
db: AsyncSession = Depends(get_db),
):
raw_body = await request.body()
signature = request.headers.get("X-Signature", "")

# 1. Verify signature BEFORE touching the database
if not verify_signature(raw_body, signature, settings.WEBHOOK_SECRET):
raise HTTPException(status_code=401, detail="Invalid signature")

payload = json.loads(raw_body)
event_id = payload.get("id") # provider-supplied unique ID

if not event_id:
raise HTTPException(status_code=400, detail="Missing event ID")

# 2. Attempt to claim the event (atomic insert)
try:
await db.execute(
text(
"INSERT INTO webhook_events (event_id, source, status, payload) "
"VALUES (:event_id, :source, 'pending', :payload)"
),
{"event_id": event_id, "source": "payment", "payload": json.dumps(payload)},
)
await db.commit()
except IntegrityError:
# Duplicate — already processed or in-flight, acknowledge and return
await db.rollback()
return {"status": "duplicate", "event_id": event_id}

# 3. Process — we are guaranteed to be the first and only handler
try:
await process_payment_event(payload)
await db.execute(
text("UPDATE webhook_events SET status = 'processed' WHERE event_id = :eid"),
{"eid": event_id},
)
await db.commit()
except Exception as exc:
# Mark failed so the background worker can retry
await db.execute(
text("UPDATE webhook_events SET status = 'failed' WHERE event_id = :eid"),
{"eid": event_id},
)
await db.commit()
# Still return 200 — the event was received and will be retried internally
# Returning 5xx here would cause the provider to retry before your worker does
return {"status": "queued_for_retry", "event_id": event_id}

return {"status": "processed", "event_id": event_id}

Always verify the signature before hitting the database — it's cheap and keeps bad actors from polluting your deduplication store. Use hmac.compare_digest rather than == to avoid timing attacks.

Stripe-Specific Signature Verification (FastAPI)

Stripe prepends a timestamp to the signed payload — it's not a simple body hash. Use the official stripe library to handle this correctly:

import stripe
from fastapi import FastAPI, Request, HTTPException

@app.post("/webhooks/stripe")
async def handle_stripe_webhook(request: Request):
raw_body = await request.body()
sig_header = request.headers.get("Stripe-Signature", "")

try:
event = stripe.Webhook.construct_event(
raw_body, sig_header, settings.STRIPE_WEBHOOK_SECRET
)
except stripe.error.SignatureVerificationError:
raise HTTPException(status_code=401, detail="Invalid Stripe signature")

event_id = event["id"] # e.g. "evt_1234..."
# ... rest of deduplication logic using event_id

The construct_event call validates both the HMAC and the timestamp tolerance (default: 300 seconds), protecting against replay attacks automatically.


Next.js Implementation

Next.js API routes are stateless, so you need an external store. Redis works well for short-lived deduplication windows; Postgres works if you want a permanent audit trail.

Using Redis for Deduplication

// app/api/webhooks/payment/route.ts
import { NextRequest, NextResponse } from "next/server";
import { createClient } from "redis";
import crypto from "crypto";

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

function verifySignature(
rawBody: string,
signature: string,
secret: string
): boolean {
const expected = crypto
.createHmac("sha256", secret)
.update(rawBody)
.digest("hex");
return crypto.timingSafeEqual(
Buffer.from(expected),
Buffer.from(signature)
);
}

export async function POST(req: NextRequest) {
const rawBody = await req.text();
const signature = req.headers.get("x-signature") ?? "";

if (!verifySignature(rawBody, signature, process.env.WEBHOOK_SECRET!)) {
return NextResponse.json({ error: "Unauthorised" }, { status: 401 });
}

const payload = JSON.parse(rawBody);
const eventId = payload?.id;

if (!eventId) {
return NextResponse.json({ error: "Missing event ID" }, { status: 400 });
}

const key = `webhook:${eventId}`;
// SET NX (only set if not exists) + TTL outliving the provider's retry window
// 3 days for Stripe (259200s), 5 hours for Shopify (18000s)
const acquired = await redis.set(key, "processing", { NX: true, EX: 259200 });

if (!acquired) {
// Duplicate delivery — acknowledge without reprocessing
return NextResponse.json({ status: "duplicate", eventId });
}

try {
await processPaymentEvent(payload);
await redis.set(key, "processed", { EX: 259200 }); // update value, keep TTL
return NextResponse.json({ status: "processed", eventId });
} catch (err) {
// Mark as failed so a retry worker can pick it up
await redis.set(key, "failed", { EX: 259200 });
// Return 200 to stop provider retries — internal worker handles re-queuing
return NextResponse.json({ status: "queued_for_retry", eventId });
}
}

SET NX is atomic in Redis, making it safe under concurrent retries without additional locking.

Shopify-Specific Header (Next.js)

Shopify sends its unique delivery ID in the X-Shopify-Webhook-Id header, not the JSON body. Use that as your deduplication key:

export async function POST(req: NextRequest) {
const rawBody = await req.text();

// Shopify uses X-Shopify-Webhook-Id for deduplication
const eventId = req.headers.get("x-shopify-webhook-id");
const hmacHeader = req.headers.get("x-shopify-hmac-sha256") ?? "";

// Shopify HMAC is base64-encoded
const expectedHmac = crypto
.createHmac("sha256", process.env.SHOPIFY_WEBHOOK_SECRET!)
.update(rawBody)
.digest("base64");

if (
!eventId ||
!crypto.timingSafeEqual(
Buffer.from(expectedHmac),
Buffer.from(hmacHeader)
)
) {
return NextResponse.json({ error: "Unauthorised" }, { status: 401 });
}

const key = `shopify:webhook:${eventId}`;
const acquired = await redis.set(key, "1", { NX: true, EX: 18000 }); // 5-hour TTL

if (!acquired) {
return NextResponse.json({ status: "duplicate", eventId });
}

// ... process event
}

Handling Partial Failures

This is the part most guides skip, and it's where real production incidents happen.

The failure mode: you've recorded the event ID (so the deduplication guard passes and won't let you in again), but processing throws an exception halfway through. You have a partially-applied event in your system and no way to retry it through the webhook handler, because the deduplication layer will block future attempts.

The solution is a two-layer status machine:

pending → processing → processed
↘ failed → (retry worker picks up) → processed | dead_letter

Background Retry Worker (FastAPI / Python)

import asyncio
from datetime import datetime, timedelta

async def webhook_retry_worker():
"""
Runs on a schedule (e.g., every 60 seconds via APScheduler or a cron job).
Picks up failed webhook events and retries them with exponential backoff.
"""
async with get_db_session() as db:
# Find events that failed and haven't exceeded max retries
failed_events = await db.execute(
text("""
SELECT event_id, source, payload, retry_count
FROM webhook_events
WHERE status = 'failed'
AND retry_count < :max_retries
AND last_attempted_at < :backoff_cutoff
ORDER BY received_at ASC
LIMIT 20
"""),
{
"max_retries": 5,
# Exponential backoff: wait 2^retry_count minutes between attempts
"backoff_cutoff": datetime.utcnow() - timedelta(minutes=2),
}
)

for row in failed_events.fetchall():
try:
await process_payment_event(row.payload)
await db.execute(
text("""
UPDATE webhook_events
SET status = 'processed', processed_at = NOW()
WHERE event_id = :eid
"""),
{"eid": row.event_id}
)
except Exception:
new_count = row.retry_count + 1
new_status = "dead_letter" if new_count >= 5 else "failed"
await db.execute(
text("""
UPDATE webhook_events
SET status = :status,
retry_count = :count,
last_attempted_at = NOW()
WHERE event_id = :eid
"""),
{"status": new_status, "count": new_count, "eid": row.event_id}
)

await db.commit()

Add the retry_count and last_attempted_at columns to your schema:

ALTER TABLE webhook_events
ADD COLUMN retry_count INT NOT NULL DEFAULT 0,
ADD COLUMN last_attempted_at TIMESTAMPTZ,
ADD COLUMN processed_at TIMESTAMPTZ;

Why Return 200 Even on Processing Failure?

This is counter-intuitive but critical. When your handler fails partway through and you return 5xx, the provider retries — but your deduplication layer now blocks the retry because the event ID is already recorded with status = 'failed'. You've created a deadlock: the provider keeps retrying, you keep blocking, nobody wins.

The correct approach: return 200 to stop the provider retry loop, then use your own internal worker to retry with proper backoff and visibility. You control the retry schedule, can alert on dead-letter events, and can manually replay specific events without touching provider dashboards.


The Accept-Then-Queue Architecture

For high-throughput or slow downstream systems, consider decoupling receipt from processing entirely:

Webhook arrives → Verify signature → Insert event ID → Enqueue job → Return 200

Worker processes asynchronously

In FastAPI, this maps cleanly to BackgroundTasks for lightweight cases, or Celery/ARQ for durable queues:

from fastapi import BackgroundTasks

@app.post("/webhooks/payment")
async def handle_payment_webhook(
request: Request,
background_tasks: BackgroundTasks,
db: AsyncSession = Depends(get_db),
):
raw_body = await request.body()
# ... signature verification ...
payload = json.loads(raw_body)
event_id = payload.get("id")

try:
await db.execute(
text("INSERT INTO webhook_events (event_id, source, payload) VALUES (:eid, :src, :p)"),
{"eid": event_id, "src": "payment", "p": json.dumps(payload)},
)
await db.commit()
except IntegrityError:
await db.rollback()
return {"status": "duplicate", "event_id": event_id}

# Queue processing asynchronously — response returns immediately
background_tasks.add_task(process_payment_event, payload)
return {"status": "accepted", "event_id": event_id}

Caveat: BackgroundTasks runs in the same process. If the server restarts before the task completes, you'll have a pending event in the database that your retry worker can pick up — but only if you update status inside the background task, not before queuing it. For true durability, use a persistent queue (Celery with Redis/RabbitMQ, or ARQ).


Making Downstream Operations Idempotent

The deduplication layer prevents double-processing of the same webhook event, but your business logic also needs to be idempotent. Consider:

  • Database upserts over insertsINSERT ... ON CONFLICT (order_id) DO NOTHING prevents double-created orders even if your deduplication layer has a race condition at startup.
  • Idempotency keys on outbound API calls — When a webhook triggers a Stripe charge or a SendGrid email, pass an idempotency key (the webhook event ID works well) to make that call idempotent too.
  • Conditional updates — Use UPDATE orders SET status = 'paid' WHERE status = 'pending' AND id = :order_id rather than an unconditional update. If the record is already paid, the update is a no-op.
# Example: idempotent order status update
async def process_payment_event(payload: dict):
order_id = payload["data"]["object"]["metadata"]["order_id"]
amount = payload["data"]["object"]["amount"]

result = await db.execute(
text("""
UPDATE orders
SET status = 'paid', paid_at = NOW(), amount_paid = :amount
WHERE id = :order_id
AND status = 'pending'
"""),
{"order_id": order_id, "amount": amount}
)
# If rowcount == 0, order was already paid — that's fine
if result.rowcount == 0:
logger.info("Order %s already paid — skipping", order_id)

Quick Checklist

  • Verify the signature before any database work — use hmac.compare_digest / crypto.timingSafeEqual
  • Use the provider's event ID, not your own generated ID
  • Match TTLs to the provider's retry window (Stripe: 3 days; Shopify: ~5 hours)
  • Return 200 on duplicates and on processing failures — let your internal worker handle retries
  • Track status (pending → processed | failed → dead_letter) so partial failures are recoverable
  • Log duplicates — a sustained spike signals upstream issues (provider instability, your endpoint returning 5xx)
  • Make downstream operations idempotent themselves, not just the intake layer
  • Alert on dead_letter events — these need human review

Final Thoughts

Idempotent webhook processing isn't complex — the core pattern fits in 30 lines of code. What's complex is remembering you need it until the moment you don't have it, and understanding that the deduplication layer is only half the story. The other half is graceful partial-failure handling so that a processing error doesn't lock you out of ever retrying an event.

Build this in from day one. Both the FastAPI and Next.js implementations above are production-deployable as-is; adjust the signature header names and TTLs to match your specific provider. And configure your dead-letter alerts before you need them — not after your Monday morning.

The deduplication store is your safety net. Trust it, and your Monday mornings will be considerably quieter.

Damian Hodgkiss

Damian Hodgkiss

Senior Staff Engineer at Sumo Group, leading development of AppSumo marketplace. Technical solopreneur with 25+ years of experience building SaaS products.

Creating Freedom

Join me on the journey from engineer to solopreneur. Learn how to build profitable SaaS products while keeping your technical edge.

    Proven strategies

    Learn the counterintuitive ways to find and validate SaaS ideas

    Technical insights

    From choosing tech stacks to building your MVP efficiently

    Founder mindset

    Transform from engineer to entrepreneur with practical steps