Idempotent Webhook Processing: Deduplication and Retry Handling in FastAPI and Next.js
Production-ready webhook handling with deduplication and retry logic in FastAPI and Next.js. Prevent duplicate charges and handle partial failures robustly.
Idempotent Webhook Processing: Deduplication and Retry Handling in FastAPI and Next.js
Webhooks are deceptively simple until they bite you. A payment provider fires an event, your server hiccups, they retry, and suddenly you've charged a customer twice. I've seen this pattern cause real damage — not hypothetical damage, real "refund queue on a Monday morning" damage. After 25 years of building SaaS products, idempotent webhook processing is one of those patterns I now reach for before anything else.
This tutorial covers a production-ready approach to deduplication and retry handling, with concrete code for both a FastAPI backend and a Next.js API route. It also covers what happens when processing partially fails, how to build a robust background retry layer, and the per-provider specifics you need to configure correctly.
Why Webhooks Need Idempotency
HTTP is unreliable. Every major provider delivers events on an at-least-once basis — never exactly-once. Stripe's own documentation explicitly states that endpoints "might occasionally receive the same event more than once." AWS SQS Standard queues make the same guarantee. Treating delivery as exactly-once is a category error; build for duplicates by default.
The most common trigger is not a dramatic network failure. It's your handler finishing the work but responding a few milliseconds too late, so the provider's timeout fires, it assumes failure, and retries an operation that already succeeded. The retry loop then processes a completed event a second time.
Idempotency means processing the same event multiple times produces exactly the same result as processing it once. The mechanism is straightforward: record a unique identifier for each event before you act on it, and reject duplicates.
Provider Retry Schedules
Different providers have very different retry windows. Configure your deduplication TTLs and dead-letter queues against the specific provider you're integrating — not a single global number.
| Provider | Retry budget | Backoff strategy | Unique ID header/field | Notes |
|---|---|---|---|---|
| Stripe | Up to 3 days (live mode) | Exponential backoff | event.id in JSON body | Includes Stripe-Signature with timestamp for replay protection |
| Shopify | 8 attempts over ~4 hours | Fixed intervals | X-Shopify-Webhook-Id header | Subscription may be auto-deleted after 8 consecutive failures |
| Svix | ~8 attempts over ~1 day | Exponential | webhook-id header | Used by many platforms as their webhook infrastructure |
| GitHub | Up to 3 retries | ~1 min intervals | X-GitHub-Delivery header | Delivery ID is consistent across retries for the same event |
Key implication: Your deduplication store TTL must outlive the provider's full retry window. For Stripe, that means at minimum 3 days. For Shopify, 4–5 hours is sufficient. Using a hard-coded 24-hour TTL is fine as a default but will miss Stripe's late retries unless you bump it.
The Core Pattern
- Extract a unique event ID from the incoming payload or headers.
- Verify the signature first — before touching the database.
- Attempt to insert that ID into a deduplication store (database or cache).
- If the insert succeeds, process the event.
- If the insert fails (duplicate key), return
200immediately — do not reprocess.
Returning 200 on a duplicate is intentional. You're telling the provider "I've handled this," which stops the retry loop. Returning 4xx or 5xx on a duplicate will cause the provider to retry indefinitely.
FastAPI Implementation
Setting Up the Deduplication Table
Using PostgreSQL:
The PRIMARY KEY constraint does the heavy lifting — attempting to insert a duplicate event_id raises a unique violation. The status column supports the partial-failure recovery pattern described below.
The FastAPI Endpoint
Always verify the signature before hitting the database — it's cheap and keeps bad actors from polluting your deduplication store. Use hmac.compare_digest rather than == to avoid timing attacks.
Stripe-Specific Signature Verification (FastAPI)
Stripe prepends a timestamp to the signed payload — it's not a simple body hash. Use the official stripe library to handle this correctly:
The construct_event call validates both the HMAC and the timestamp tolerance (default: 300 seconds), protecting against replay attacks automatically.
Next.js Implementation
Next.js API routes are stateless, so you need an external store. Redis works well for short-lived deduplication windows; Postgres works if you want a permanent audit trail.
Using Redis for Deduplication
SET NX is atomic in Redis, making it safe under concurrent retries without additional locking.
Shopify-Specific Header (Next.js)
Shopify sends its unique delivery ID in the X-Shopify-Webhook-Id header, not the JSON body. Use that as your deduplication key:
Handling Partial Failures
This is the part most guides skip, and it's where real production incidents happen.
The failure mode: you've recorded the event ID (so the deduplication guard passes and won't let you in again), but processing throws an exception halfway through. You have a partially-applied event in your system and no way to retry it through the webhook handler, because the deduplication layer will block future attempts.
The solution is a two-layer status machine:
Background Retry Worker (FastAPI / Python)
Add the retry_count and last_attempted_at columns to your schema:
Why Return 200 Even on Processing Failure?
This is counter-intuitive but critical. When your handler fails partway through and you return 5xx, the provider retries — but your deduplication layer now blocks the retry because the event ID is already recorded with status = 'failed'. You've created a deadlock: the provider keeps retrying, you keep blocking, nobody wins.
The correct approach: return 200 to stop the provider retry loop, then use your own internal worker to retry with proper backoff and visibility. You control the retry schedule, can alert on dead-letter events, and can manually replay specific events without touching provider dashboards.
The Accept-Then-Queue Architecture
For high-throughput or slow downstream systems, consider decoupling receipt from processing entirely:
In FastAPI, this maps cleanly to BackgroundTasks for lightweight cases, or Celery/ARQ for durable queues:
Caveat:
BackgroundTasksruns in the same process. If the server restarts before the task completes, you'll have apendingevent in the database that your retry worker can pick up — but only if you update status inside the background task, not before queuing it. For true durability, use a persistent queue (Celery with Redis/RabbitMQ, or ARQ).
Making Downstream Operations Idempotent
The deduplication layer prevents double-processing of the same webhook event, but your business logic also needs to be idempotent. Consider:
- Database upserts over inserts —
INSERT ... ON CONFLICT (order_id) DO NOTHINGprevents double-created orders even if your deduplication layer has a race condition at startup. - Idempotency keys on outbound API calls — When a webhook triggers a Stripe charge or a SendGrid email, pass an idempotency key (the webhook event ID works well) to make that call idempotent too.
- Conditional updates — Use
UPDATE orders SET status = 'paid' WHERE status = 'pending' AND id = :order_idrather than an unconditional update. If the record is alreadypaid, the update is a no-op.
Quick Checklist
- Verify the signature before any database work — use
hmac.compare_digest/crypto.timingSafeEqual - Use the provider's event ID, not your own generated ID
- Match TTLs to the provider's retry window (Stripe: 3 days; Shopify: ~5 hours)
- Return
200on duplicates and on processing failures — let your internal worker handle retries - Track status (
pending → processed | failed → dead_letter) so partial failures are recoverable - Log duplicates — a sustained spike signals upstream issues (provider instability, your endpoint returning 5xx)
- Make downstream operations idempotent themselves, not just the intake layer
- Alert on
dead_letterevents — these need human review
Final Thoughts
Idempotent webhook processing isn't complex — the core pattern fits in 30 lines of code. What's complex is remembering you need it until the moment you don't have it, and understanding that the deduplication layer is only half the story. The other half is graceful partial-failure handling so that a processing error doesn't lock you out of ever retrying an event.
Build this in from day one. Both the FastAPI and Next.js implementations above are production-deployable as-is; adjust the signature header names and TTLs to match your specific provider. And configure your dead-letter alerts before you need them — not after your Monday morning.
The deduplication store is your safety net. Trust it, and your Monday mornings will be considerably quieter.
Damian Hodgkiss
Senior Staff Engineer at Sumo Group, leading development of AppSumo marketplace. Technical solopreneur with 25+ years of experience building SaaS products.