Idempotent Webhook Processing: Deduplication and Retry Handling in FastAPI and Next.js

Webhooks are deceptively simple until they bite you. A payment provider fires an event, your server hiccups, they retry, and suddenly you've charged a customer twice. I've seen this pattern cause real damage — not hypothetical damage, real "refund queue on a Monday morning" damage. After 25 years of building SaaS products, idempotent webhook processing is one of those patterns I now reach for before anything else.

This tutorial covers a production-ready approach to deduplication and retry handling, with concrete code for both a FastAPI backend and a Next.js API route. It also covers what happens when processing partially fails, how to build a robust background retry layer, and the per-provider specifics you need to configure correctly.

Why Webhooks Need Idempotency

HTTP is unreliable. Every major provider delivers events on an at-least-once basis — never exactly-once. Stripe's own documentation explicitly states that endpoints "might occasionally receive the same event more than once." AWS SQS Standard queues make the same guarantee. Treating delivery as exactly-once is a category error; build for duplicates by default.

The most common trigger is not a dramatic network failure. It's your handler finishing the work but responding a few milliseconds too late, so the provider's timeout fires, it assumes failure, and retries an operation that already succeeded. The retry loop then processes a completed event a second time.

Idempotency means processing the same event multiple times produces exactly the same result as processing it once. The mechanism is straightforward: record a unique identifier for each event before you act on it, and reject duplicates.

Provider Retry Schedules

Different providers have very different retry windows. Configure your deduplication TTLs and dead-letter queues against the specific provider you're integrating — not a single global number.

Provider	Retry budget	Backoff strategy	Unique ID header/field	Notes
Stripe	Up to 3 days (live mode)	Exponential backoff	`event.id` in JSON body	Includes `Stripe-Signature` with timestamp for replay protection
Shopify	8 attempts over ~4 hours	Fixed intervals	`X-Shopify-Webhook-Id` header	Subscription may be auto-deleted after 8 consecutive failures
Svix	~8 attempts over ~1 day	Exponential	`webhook-id` header	Used by many platforms as their webhook infrastructure
GitHub	Up to 3 retries	~1 min intervals	`X-GitHub-Delivery` header	Delivery ID is consistent across retries for the same event

Key implication: Your deduplication store TTL must outlive the provider's full retry window. For Stripe, that means at minimum 3 days. For Shopify, 4–5 hours is sufficient. Using a hard-coded 24-hour TTL is fine as a default but will miss Stripe's late retries unless you bump it.

The Core Pattern

Extract a unique event ID from the incoming payload or headers.
Verify the signature first — before touching the database.
Attempt to insert that ID into a deduplication store (database or cache).
If the insert succeeds, process the event.
If the insert fails (duplicate key), return 200 immediately — do not reprocess.

Returning 200 on a duplicate is intentional. You're telling the provider "I've handled this," which stops the retry loop. Returning 4xx or 5xx on a duplicate will cause the provider to retry indefinitely.

FastAPI Implementation

Setting Up the Deduplication Table

Using PostgreSQL:

CREATE TABLE webhook_events (
    event_id    TEXT PRIMARY KEY,
    source      TEXT NOT NULL,
    received_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    status      TEXT NOT NULL DEFAULT 'pending',  -- pending | processed | failed
    payload     JSONB
);

-- Index for the background retry worker
CREATE INDEX webhook_events_status_idx ON webhook_events (status, received_at)
    WHERE status IN ('pending', 'failed');

The PRIMARY KEY constraint does the heavy lifting — attempting to insert a duplicate event_id raises a unique violation. The status column supports the partial-failure recovery pattern described below.

The FastAPI Endpoint

from fastapi import FastAPI, Request, HTTPException, Depends
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.exc import IntegrityError
from sqlalchemy import text
import hashlib, hmac, json

app = FastAPI()

async def verify_signature(raw_body: bytes, signature: str, secret: str) -> bool:
    expected = hmac.new(
        secret.encode(), raw_body, hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

@app.post("/webhooks/payment")
async def handle_payment_webhook(
    request: Request,
    db: AsyncSession = Depends(get_db),
):
    raw_body = await request.body()
    signature = request.headers.get("X-Signature", "")

    # 1. Verify signature BEFORE touching the database
    if not verify_signature(raw_body, signature, settings.WEBHOOK_SECRET):
        raise HTTPException(status_code=401, detail="Invalid signature")

    payload = json.loads(raw_body)
    event_id = payload.get("id")  # provider-supplied unique ID

    if not event_id:
        raise HTTPException(status_code=400, detail="Missing event ID")

    # 2. Attempt to claim the event (atomic insert)
    try:
        await db.execute(
            text(
                "INSERT INTO webhook_events (event_id, source, status, payload) "
                "VALUES (:event_id, :source, 'pending', :payload)"
            ),
            {"event_id": event_id, "source": "payment", "payload": json.dumps(payload)},
        )
        await db.commit()
    except IntegrityError:
        # Duplicate — already processed or in-flight, acknowledge and return
        await db.rollback()
        return {"status": "duplicate", "event_id": event_id}

    # 3. Process — we are guaranteed to be the first and only handler
    try:
        await process_payment_event(payload)
        await db.execute(
            text("UPDATE webhook_events SET status = 'processed' WHERE event_id = :eid"),
            {"eid": event_id},
        )
        await db.commit()
    except Exception as exc:
        # Mark failed so the background worker can retry
        await db.execute(
            text("UPDATE webhook_events SET status = 'failed' WHERE event_id = :eid"),
            {"eid": event_id},
        )
        await db.commit()
        # Still return 200 — the event was received and will be retried internally
        # Returning 5xx here would cause the provider to retry before your worker does
        return {"status": "queued_for_retry", "event_id": event_id}

    return {"status": "processed", "event_id": event_id}

Always verify the signature before hitting the database — it's cheap and keeps bad actors from polluting your deduplication store. Use hmac.compare_digest rather than == to avoid timing attacks.

Stripe-Specific Signature Verification (FastAPI)

Stripe prepends a timestamp to the signed payload — it's not a simple body hash. Use the official stripe library to handle this correctly:

import stripe
from fastapi import FastAPI, Request, HTTPException

@app.post("/webhooks/stripe")
async def handle_stripe_webhook(request: Request):
    raw_body = await request.body()
    sig_header = request.headers.get("Stripe-Signature", "")

    try:
        event = stripe.Webhook.construct_event(
            raw_body, sig_header, settings.STRIPE_WEBHOOK_SECRET
        )
    except stripe.error.SignatureVerificationError:
        raise HTTPException(status_code=401, detail="Invalid Stripe signature")

    event_id = event["id"]  # e.g. "evt_1234..."
    # ... rest of deduplication logic using event_id

The construct_event call validates both the HMAC and the timestamp tolerance (default: 300 seconds), protecting against replay attacks automatically.

Next.js Implementation

Next.js API routes are stateless, so you need an external store. Redis works well for short-lived deduplication windows; Postgres works if you want a permanent audit trail.

Using Redis for Deduplication

// app/api/webhooks/payment/route.ts
import { NextRequest, NextResponse } from "next/server";
import { createClient } from "redis";
import crypto from "crypto";

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

function verifySignature(
  rawBody: string,
  signature: string,
  secret: string
): boolean {
  const expected = crypto
    .createHmac("sha256", secret)
    .update(rawBody)
    .digest("hex");
  return crypto.timingSafeEqual(
    Buffer.from(expected),
    Buffer.from(signature)
  );
}

export async function POST(req: NextRequest) {
  const rawBody = await req.text();
  const signature = req.headers.get("x-signature") ?? "";

  if (!verifySignature(rawBody, signature, process.env.WEBHOOK_SECRET!)) {
    return NextResponse.json({ error: "Unauthorised" }, { status: 401 });
  }

  const payload = JSON.parse(rawBody);
  const eventId = payload?.id;

  if (!eventId) {
    return NextResponse.json({ error: "Missing event ID" }, { status: 400 });
  }

  const key = `webhook:${eventId}`;
  // SET NX (only set if not exists) + TTL outliving the provider's retry window
  // 3 days for Stripe (259200s), 5 hours for Shopify (18000s)
  const acquired = await redis.set(key, "processing", { NX: true, EX: 259200 });

  if (!acquired) {
    // Duplicate delivery — acknowledge without reprocessing
    return NextResponse.json({ status: "duplicate", eventId });
  }

  try {
    await processPaymentEvent(payload);
    await redis.set(key, "processed", { EX: 259200 }); // update value, keep TTL
    return NextResponse.json({ status: "processed", eventId });
  } catch (err) {
    // Mark as failed so a retry worker can pick it up
    await redis.set(key, "failed", { EX: 259200 });
    // Return 200 to stop provider retries — internal worker handles re-queuing
    return NextResponse.json({ status: "queued_for_retry", eventId });
  }
}

SET NX is atomic in Redis, making it safe under concurrent retries without additional locking.

Shopify-Specific Header (Next.js)

Shopify sends its unique delivery ID in the X-Shopify-Webhook-Id header, not the JSON body. Use that as your deduplication key:

export async function POST(req: NextRequest) {
  const rawBody = await req.text();

  // Shopify uses X-Shopify-Webhook-Id for deduplication
  const eventId = req.headers.get("x-shopify-webhook-id");
  const hmacHeader = req.headers.get("x-shopify-hmac-sha256") ?? "";

  // Shopify HMAC is base64-encoded
  const expectedHmac = crypto
    .createHmac("sha256", process.env.SHOPIFY_WEBHOOK_SECRET!)
    .update(rawBody)
    .digest("base64");

  if (
    !eventId ||
    !crypto.timingSafeEqual(
      Buffer.from(expectedHmac),
      Buffer.from(hmacHeader)
    )
  ) {
    return NextResponse.json({ error: "Unauthorised" }, { status: 401 });
  }

  const key = `shopify:webhook:${eventId}`;
  const acquired = await redis.set(key, "1", { NX: true, EX: 18000 }); // 5-hour TTL

  if (!acquired) {
    return NextResponse.json({ status: "duplicate", eventId });
  }

  // ... process event
}

Handling Partial Failures

This is the part most guides skip, and it's where real production incidents happen.

The failure mode: you've recorded the event ID (so the deduplication guard passes and won't let you in again), but processing throws an exception halfway through. You have a partially-applied event in your system and no way to retry it through the webhook handler, because the deduplication layer will block future attempts.

The solution is a two-layer status machine:

pending → processing → processed
                    ↘ failed → (retry worker picks up) → processed | dead_letter

Background Retry Worker (FastAPI / Python)

import asyncio
from datetime import datetime, timedelta

async def webhook_retry_worker():
    """
    Runs on a schedule (e.g., every 60 seconds via APScheduler or a cron job).
    Picks up failed webhook events and retries them with exponential backoff.
    """
    async with get_db_session() as db:
        # Find events that failed and haven't exceeded max retries
        failed_events = await db.execute(
            text("""
                SELECT event_id, source, payload, retry_count
                FROM webhook_events
                WHERE status = 'failed'
                  AND retry_count < :max_retries
                  AND last_attempted_at < :backoff_cutoff
                ORDER BY received_at ASC
                LIMIT 20
            """),
            {
                "max_retries": 5,
                # Exponential backoff: wait 2^retry_count minutes between attempts
                "backoff_cutoff": datetime.utcnow() - timedelta(minutes=2),
            }
        )

        for row in failed_events.fetchall():
            try:
                await process_payment_event(row.payload)
                await db.execute(
                    text("""
                        UPDATE webhook_events
                        SET status = 'processed', processed_at = NOW()
                        WHERE event_id = :eid
                    """),
                    {"eid": row.event_id}
                )
            except Exception:
                new_count = row.retry_count + 1
                new_status = "dead_letter" if new_count >= 5 else "failed"
                await db.execute(
                    text("""
                        UPDATE webhook_events
                        SET status = :status,
                            retry_count = :count,
                            last_attempted_at = NOW()
                        WHERE event_id = :eid
                    """),
                    {"status": new_status, "count": new_count, "eid": row.event_id}
                )

        await db.commit()

Add the retry_count and last_attempted_at columns to your schema:

ALTER TABLE webhook_events
    ADD COLUMN retry_count      INT NOT NULL DEFAULT 0,
    ADD COLUMN last_attempted_at TIMESTAMPTZ,
    ADD COLUMN processed_at      TIMESTAMPTZ;

Why Return `200` Even on Processing Failure?

This is counter-intuitive but critical. When your handler fails partway through and you return 5xx, the provider retries — but your deduplication layer now blocks the retry because the event ID is already recorded with status = 'failed'. You've created a deadlock: the provider keeps retrying, you keep blocking, nobody wins.

The correct approach: return 200 to stop the provider retry loop, then use your own internal worker to retry with proper backoff and visibility. You control the retry schedule, can alert on dead-letter events, and can manually replay specific events without touching provider dashboards.

The Accept-Then-Queue Architecture

For high-throughput or slow downstream systems, consider decoupling receipt from processing entirely:

Webhook arrives → Verify signature → Insert event ID → Enqueue job → Return 200
                                                              ↓
                                                    Worker processes asynchronously

In FastAPI, this maps cleanly to BackgroundTasks for lightweight cases, or Celery/ARQ for durable queues:

from fastapi import BackgroundTasks

@app.post("/webhooks/payment")
async def handle_payment_webhook(
    request: Request,
    background_tasks: BackgroundTasks,
    db: AsyncSession = Depends(get_db),
):
    raw_body = await request.body()
    # ... signature verification ...
    payload = json.loads(raw_body)
    event_id = payload.get("id")

    try:
        await db.execute(
            text("INSERT INTO webhook_events (event_id, source, payload) VALUES (:eid, :src, :p)"),
            {"eid": event_id, "src": "payment", "p": json.dumps(payload)},
        )
        await db.commit()
    except IntegrityError:
        await db.rollback()
        return {"status": "duplicate", "event_id": event_id}

    # Queue processing asynchronously — response returns immediately
    background_tasks.add_task(process_payment_event, payload)
    return {"status": "accepted", "event_id": event_id}

Caveat: BackgroundTasks runs in the same process. If the server restarts before the task completes, you'll have a pending event in the database that your retry worker can pick up — but only if you update status inside the background task, not before queuing it. For true durability, use a persistent queue (Celery with Redis/RabbitMQ, or ARQ).

Making Downstream Operations Idempotent

The deduplication layer prevents double-processing of the same webhook event, but your business logic also needs to be idempotent. Consider:

Database upserts over inserts — INSERT ... ON CONFLICT (order_id) DO NOTHING prevents double-created orders even if your deduplication layer has a race condition at startup.
Idempotency keys on outbound API calls — When a webhook triggers a Stripe charge or a SendGrid email, pass an idempotency key (the webhook event ID works well) to make that call idempotent too.
Conditional updates — Use UPDATE orders SET status = 'paid' WHERE status = 'pending' AND id = :order_id rather than an unconditional update. If the record is already paid, the update is a no-op.

# Example: idempotent order status update
async def process_payment_event(payload: dict):
    order_id = payload["data"]["object"]["metadata"]["order_id"]
    amount = payload["data"]["object"]["amount"]

    result = await db.execute(
        text("""
            UPDATE orders
            SET status = 'paid', paid_at = NOW(), amount_paid = :amount
            WHERE id = :order_id
              AND status = 'pending'
        """),
        {"order_id": order_id, "amount": amount}
    )
    # If rowcount == 0, order was already paid — that's fine
    if result.rowcount == 0:
        logger.info("Order %s already paid — skipping", order_id)

Quick Checklist

Verify the signature before any database work — use hmac.compare_digest / crypto.timingSafeEqual
Use the provider's event ID, not your own generated ID
Match TTLs to the provider's retry window (Stripe: 3 days; Shopify: ~5 hours)
Return 200 on duplicates and on processing failures — let your internal worker handle retries
Track status (pending → processed | failed → dead_letter) so partial failures are recoverable
Log duplicates — a sustained spike signals upstream issues (provider instability, your endpoint returning 5xx)
Make downstream operations idempotent themselves, not just the intake layer
Alert on dead_letter events — these need human review

Final Thoughts

Idempotent webhook processing isn't complex — the core pattern fits in 30 lines of code. What's complex is remembering you need it until the moment you don't have it, and understanding that the deduplication layer is only half the story. The other half is graceful partial-failure handling so that a processing error doesn't lock you out of ever retrying an event.

Build this in from day one. Both the FastAPI and Next.js implementations above are production-deployable as-is; adjust the signature header names and TTLs to match your specific provider. And configure your dead-letter alerts before you need them — not after your Monday morning.

The deduplication store is your safety net. Trust it, and your Monday mornings will be considerably quieter.

Idempotent Webhook Processing: Deduplication and Retry Handling in FastAPI and Next.js

Why Webhooks Need Idempotency

Provider Retry Schedules

The Core Pattern

FastAPI Implementation

Setting Up the Deduplication Table

The FastAPI Endpoint

Stripe-Specific Signature Verification (FastAPI)

Next.js Implementation

Using Redis for Deduplication

Shopify-Specific Header (Next.js)

Handling Partial Failures

Background Retry Worker (FastAPI / Python)

Why Return `200` Even on Processing Failure?

The Accept-Then-Queue Architecture

Making Downstream Operations Idempotent

Quick Checklist

Final Thoughts

Damian Hodgkiss

Creating Freedom

Proven strategies

Technical insights

Founder mindset

Why Webhooks Need Idempotency

Provider Retry Schedules

The Core Pattern

FastAPI Implementation

Setting Up the Deduplication Table

The FastAPI Endpoint

Stripe-Specific Signature Verification (FastAPI)

Next.js Implementation

Using Redis for Deduplication

Shopify-Specific Header (Next.js)

Handling Partial Failures

Background Retry Worker (FastAPI / Python)

Why Return 200 Even on Processing Failure?

The Accept-Then-Queue Architecture

Making Downstream Operations Idempotent

Quick Checklist

Final Thoughts

Damian Hodgkiss

Creating Freedom

Proven strategies

Technical insights

Founder mindset

Why Return `200` Even on Processing Failure?