Idempotency and Retry Patterns for Payment Webhooks (Stripe) in Next.js and FastAPI
Build production-grade Stripe webhook handlers in Next.js and FastAPI that survive retries without double-charging customers. Idempotency keys and deduplication patterns explained.
Idempotency and Retry Patterns for Payment Webhooks (Stripe) in Next.js and FastAPI
Webhooks are deceptively simple until money is involved. Stripe retries a webhook endpoint with exponential back-off for up to three days if it doesn't receive a 2xx response — a sensible guarantee, but one that turns a non-idempotent handler into a liability. If your handler isn't idempotent, you'll double-charge customers, double-fulfil orders, or create duplicate subscription records. I've seen all three. Getting idempotency right separates production-grade integrations from liabilities.
Let's build it properly for both stacks.
Why Idempotency Matters More Than You Think
An idempotent operation produces the same result whether you run it once or a hundred times. For payment webhooks, that means receiving checkout.session.completed twice should result in exactly one fulfilled order — not two.
Stripe retries when your endpoint returns non-2xx, times out, or drops the connection. Network blips, deploy restarts, and cold starts all cause retries in practice. Your handler will receive duplicate events — guaranteed over three days of exponential back-off.
The four real-world causes of duplicate delivery
| Cause | Why it happens |
|---|---|
Non-2xx response | Your handler crashed, timed out, or returned an error |
| Deploy restart | Your server restarted mid-handler after persisting state |
| Cold start | Serverless function timed out before returning 200 |
| Stripe infrastructure | Stripe itself can deliver an event more than once |
The Core Pattern
Regardless of stack:
- Verify the Stripe signature before touching anything.
- Record the
stripe_event_idin a database table with a unique constraint. - Attempt the insert — if it conflicts, the event is a duplicate; return
200immediately. - Process the event inside a transaction or with compensating logic.
- Mark the record as processed only after success.
Signature Verification and Clock Skew
Stripe signs every webhook with an HMAC-SHA256 signature and includes a timestamp in the Stripe-Signature header. The stripe.webhooks.constructEvent (Node) and stripe.Webhook.construct_event (Python) helpers verify both the signature and that the timestamp is within a configurable tolerance window — 300 seconds (5 minutes) by default.
This tolerance guards against replay attacks: someone capturing a valid signed payload and replaying it hours later. The timestamp check means a replayed event with an old timestamp will be rejected automatically.
Clock skew gotcha: If your server's system clock drifts significantly from UTC,
constructEventwill start throwingSignatureVerificationErroron legitimately fresh events. Keep NTP synchronised on your servers; on Kubernetes, ensure the node clock is healthy. On AWS Lambda and Vercel this is managed for you.
You can customise the tolerance window, but don't increase it beyond a few minutes — doing so widens the replay-attack window:
Next.js Implementation
1. Disable the body parser
Stripe signature verification requires the raw request body. In Next.js App Router, request.text() gives you the raw bytes as a string — do not parse it as JSON first. Create app/api/webhooks/stripe/route.ts:
Transaction semantics:
handleEvent()runs insideBEGIN…COMMIT. IfhandleEvent()throws — say, your database write for fulfilling an order fails — theROLLBACKundoes any partial state and leavesprocessed_atasNULL. Stripe retries, theINSERTconflicts again (the initial row is already there), and your handler returns200without re-running business logic. That's correct for duplicate-delivery protection, but it means a permanently failing handler won't be retried after the three-day window. Log and alert on any500responses.
2. The events table
The PRIMARY KEY on event_id enforces uniqueness. The index on received_at makes the pruning query fast.
FastAPI Implementation
The Python stripe library's Webhook.construct_event is synchronous — it doesn't do any I/O. Wrap database operations in proper async context and use async with for your session to keep FastAPI's async model consistent:
Async consistency: The
async with async_session_factory()pattern uses SQLAlchemy'sAsyncSession. Both the idempotency insert and thehandle_eventcall areawaited, so the event loop is never blocked. Thestripe.Webhook.construct_eventcall is CPU-only (HMAC verification) — it's fast enough to run synchronously without anasyncio.to_threadwrapper.
SQLAlchemy model
Handling Retries Gracefully
A few operational realities:
Return 200 for events you don't care about. If you handle only checkout.session.completed but Stripe sends customer.updated, return 200 — don't let Stripe retry indefinitely on an intentionally unhandled event type.
Don't do slow work synchronously. If fulfilment involves emails, resource provisioning, or third-party API calls, push to a queue (BullMQ in Next.js, Celery or ARQ in FastAPI) and return 200 immediately. Your webhook handler should persist intent and enqueue; nothing more.
Set realistic timeouts. Stripe's timeout window for webhooks is short — a few seconds. A database insert and an enqueue fit comfortably; full synchronous fulfilment often won't.
Prune old events. The stripe_events table will grow. Add a scheduled job to delete rows older than 30–90 days. Stripe won't retry beyond three days, so rows older than that are bookkeeping data only.
Distinguish retriable from non-retriable failures. Not every error warrants a Stripe retry:
| Failure type | Return code | Rationale |
|---|---|---|
| DB temporarily unavailable | 500 | Transient — retry makes sense |
| Permanent fulfilment logic error | 200 + alert | Retrying won't fix it; log and alert |
| Unknown event type | 200 | Intentional no-op |
| Invalid signature | 400 | Don't retry — bad request |
Testing the Idempotency Logic
Use the Stripe CLI to replay events locally:
Confirm your handler:
- Processes the event on the first delivery.
- Returns
200on the second delivery without re-running business logic. - Has exactly one row in
stripe_eventswithprocessed_atset.
For FastAPI, the equivalent forward target:
Integration test skeleton (pytest)
Final Thoughts
The pattern itself is simple — a unique insert, a conflict check, return early. What makes it production-grade is applying it consistently before any business logic runs, being explicit about which failures should trigger a Stripe retry versus which should be swallowed, and handling clock skew before it bites you in a late-night deploy. The bugs I've seen most often aren't in the happy path — they're in the retry path that nobody tested. Lock this down early and you won't be debugging duplicate orders at 2 AM.
Damian Hodgkiss
Senior Staff Engineer at Sumo Group, leading development of AppSumo marketplace. Technical solopreneur with 25+ years of experience building SaaS products.