DH
9 min read

Idempotency and Retry Patterns for Payment Webhooks (Stripe) in Next.js and FastAPI

Build production-grade Stripe webhook handlers in Next.js and FastAPI that survive retries without double-charging customers. Idempotency keys and deduplication patterns explained.

webhooksfastapinextjs

Idempotency and Retry Patterns for Payment Webhooks (Stripe) in Next.js and FastAPI

Webhooks are deceptively simple until money is involved. Stripe retries a webhook endpoint with exponential back-off for up to three days if it doesn't receive a 2xx response — a sensible guarantee, but one that turns a non-idempotent handler into a liability. If your handler isn't idempotent, you'll double-charge customers, double-fulfil orders, or create duplicate subscription records. I've seen all three. Getting idempotency right separates production-grade integrations from liabilities.

Let's build it properly for both stacks.


Why Idempotency Matters More Than You Think

An idempotent operation produces the same result whether you run it once or a hundred times. For payment webhooks, that means receiving checkout.session.completed twice should result in exactly one fulfilled order — not two.

Stripe retries when your endpoint returns non-2xx, times out, or drops the connection. Network blips, deploy restarts, and cold starts all cause retries in practice. Your handler will receive duplicate events — guaranteed over three days of exponential back-off.

The four real-world causes of duplicate delivery

CauseWhy it happens
Non-2xx responseYour handler crashed, timed out, or returned an error
Deploy restartYour server restarted mid-handler after persisting state
Cold startServerless function timed out before returning 200
Stripe infrastructureStripe itself can deliver an event more than once

The Core Pattern

Regardless of stack:

  1. Verify the Stripe signature before touching anything.
  2. Record the stripe_event_id in a database table with a unique constraint.
  3. Attempt the insert — if it conflicts, the event is a duplicate; return 200 immediately.
  4. Process the event inside a transaction or with compensating logic.
  5. Mark the record as processed only after success.

Signature Verification and Clock Skew

Stripe signs every webhook with an HMAC-SHA256 signature and includes a timestamp in the Stripe-Signature header. The stripe.webhooks.constructEvent (Node) and stripe.Webhook.construct_event (Python) helpers verify both the signature and that the timestamp is within a configurable tolerance window — 300 seconds (5 minutes) by default.

This tolerance guards against replay attacks: someone capturing a valid signed payload and replaying it hours later. The timestamp check means a replayed event with an old timestamp will be rejected automatically.

Clock skew gotcha: If your server's system clock drifts significantly from UTC, constructEvent will start throwing SignatureVerificationError on legitimately fresh events. Keep NTP synchronised on your servers; on Kubernetes, ensure the node clock is healthy. On AWS Lambda and Vercel this is managed for you.

You can customise the tolerance window, but don't increase it beyond a few minutes — doing so widens the replay-attack window:

// Node.js — extend tolerance to 600 seconds (not recommended in production)
event = stripe.webhooks.constructEvent(body, signature, webhookSecret, 600);
# Python — extend tolerance to 600 seconds (not recommended in production)
event = stripe.Webhook.construct_event(
payload, sig_header, settings.STRIPE_WEBHOOK_SECRET,
tolerance=600
)

Next.js Implementation

1. Disable the body parser

Stripe signature verification requires the raw request body. In Next.js App Router, request.text() gives you the raw bytes as a string — do not parse it as JSON first. Create app/api/webhooks/stripe/route.ts:

import { headers } from 'next/headers';
import Stripe from 'stripe';
import { db } from '@/lib/db'; // your Postgres client

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
const webhookSecret = process.env.STRIPE_WEBHOOK_SECRET!;

export async function POST(request: Request) {
const body = await request.text(); // raw body — must not be parsed first
const signature = headers().get('stripe-signature')!;

let event: Stripe.Event;
try {
event = stripe.webhooks.constructEvent(body, signature, webhookSecret);
} catch {
return new Response('Invalid signature', { status: 400 });
}

// Idempotency gate: unique constraint on event_id
try {
await db.query(
`INSERT INTO stripe_events (event_id, type, received_at)
VALUES ($1, $2, NOW())`,
[event.id, event.type]
);
} catch (err: any) {
if (err.code === '23505') {
// Unique violation — already seen this event
return new Response('OK', { status: 200 });
}
return new Response('DB error', { status: 500 });
}

// Process inside a transaction so that handleEvent() and the processed_at
// update are atomic. If handleEvent() throws, the transaction rolls back
// and we return 500 — Stripe will retry.
try {
await db.query('BEGIN');
await handleEvent(event);
await db.query(
`UPDATE stripe_events SET processed_at = NOW() WHERE event_id = $1`,
[event.id]
);
await db.query('COMMIT');
} catch {
await db.query('ROLLBACK');
// Return 500; Stripe will retry
return new Response('Handler error', { status: 500 });
}

return new Response('OK', { status: 200 });
}

Transaction semantics: handleEvent() runs inside BEGIN…COMMIT. If handleEvent() throws — say, your database write for fulfilling an order fails — the ROLLBACK undoes any partial state and leaves processed_at as NULL. Stripe retries, the INSERT conflicts again (the initial row is already there), and your handler returns 200 without re-running business logic. That's correct for duplicate-delivery protection, but it means a permanently failing handler won't be retried after the three-day window. Log and alert on any 500 responses.

2. The events table

CREATE TABLE stripe_events (
event_id TEXT PRIMARY KEY,
type TEXT NOT NULL,
received_at TIMESTAMPTZ NOT NULL,
processed_at TIMESTAMPTZ
);

CREATE INDEX idx_stripe_events_received_at ON stripe_events (received_at);

The PRIMARY KEY on event_id enforces uniqueness. The index on received_at makes the pruning query fast.


FastAPI Implementation

The Python stripe library's Webhook.construct_event is synchronous — it doesn't do any I/O. Wrap database operations in proper async context and use async with for your session to keep FastAPI's async model consistent:

import stripe
from datetime import datetime, timezone
from fastapi import FastAPI, Request, HTTPException
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.asyncio import AsyncSession
from app.database import async_session_factory
from app.models import StripeEvent
from app import settings

app = FastAPI()

stripe.api_key = settings.STRIPE_SECRET_KEY

@app.post("/webhooks/stripe")
async def stripe_webhook(request: Request):
payload = await request.body()
sig_header = request.headers.get("stripe-signature")

# construct_event is synchronous — no await needed
try:
event = stripe.Webhook.construct_event(
payload, sig_header, settings.STRIPE_WEBHOOK_SECRET
)
except stripe.error.SignatureVerificationError:
raise HTTPException(status_code=400, detail="Invalid signature")

async with async_session_factory() as db:
# Idempotency gate: unique constraint on event_id
try:
record = StripeEvent(
event_id=event["id"],
event_type=event["type"],
)
db.add(record)
await db.commit()
except IntegrityError:
await db.rollback()
return {"status": "duplicate"}

# Process the event — pass db so the handler shares the transaction
try:
await handle_event(event, db)
record.processed_at = datetime.now(timezone.utc)
await db.commit()
except Exception:
await db.rollback()
raise HTTPException(status_code=500, detail="Handler failed")

return {"status": "ok"}

Async consistency: The async with async_session_factory() pattern uses SQLAlchemy's AsyncSession. Both the idempotency insert and the handle_event call are awaited, so the event loop is never blocked. The stripe.Webhook.construct_event call is CPU-only (HMAC verification) — it's fast enough to run synchronously without an asyncio.to_thread wrapper.

SQLAlchemy model

from sqlalchemy import Column, String, DateTime
from sqlalchemy.orm import DeclarativeBase

class Base(DeclarativeBase):
pass

class StripeEvent(Base):
__tablename__ = "stripe_events"

event_id = Column(String, primary_key=True)
event_type = Column(String, nullable=False)
received_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc))
processed_at = Column(DateTime(timezone=True), nullable=True)

Handling Retries Gracefully

A few operational realities:

Return 200 for events you don't care about. If you handle only checkout.session.completed but Stripe sends customer.updated, return 200 — don't let Stripe retry indefinitely on an intentionally unhandled event type.

// Next.js — explicit allow-list
const HANDLED_EVENTS = new Set([
'checkout.session.completed',
'invoice.paid',
'customer.subscription.deleted',
]);

if (!HANDLED_EVENTS.has(event.type)) {
return new Response('Unhandled event type', { status: 200 });
}
# FastAPI — explicit allow-list
HANDLED_EVENTS = {
"checkout.session.completed",
"invoice.paid",
"customer.subscription.deleted",
}

if event["type"] not in HANDLED_EVENTS:
return {"status": "unhandled"}

Don't do slow work synchronously. If fulfilment involves emails, resource provisioning, or third-party API calls, push to a queue (BullMQ in Next.js, Celery or ARQ in FastAPI) and return 200 immediately. Your webhook handler should persist intent and enqueue; nothing more.

Set realistic timeouts. Stripe's timeout window for webhooks is short — a few seconds. A database insert and an enqueue fit comfortably; full synchronous fulfilment often won't.

Prune old events. The stripe_events table will grow. Add a scheduled job to delete rows older than 30–90 days. Stripe won't retry beyond three days, so rows older than that are bookkeeping data only.

-- Run weekly via pg_cron or a scheduled job
DELETE FROM stripe_events
WHERE received_at < NOW() - INTERVAL '90 days';

Distinguish retriable from non-retriable failures. Not every error warrants a Stripe retry:

Failure typeReturn codeRationale
DB temporarily unavailable500Transient — retry makes sense
Permanent fulfilment logic error200 + alertRetrying won't fix it; log and alert
Unknown event type200Intentional no-op
Invalid signature400Don't retry — bad request

Testing the Idempotency Logic

Use the Stripe CLI to replay events locally:

# Terminal 1 — forward Stripe events to your local server
stripe listen --forward-to localhost:3000/api/webhooks/stripe

# Terminal 2 — replay the same event twice
stripe events resend evt_123abc
stripe events resend evt_123abc

Confirm your handler:

  1. Processes the event on the first delivery.
  2. Returns 200 on the second delivery without re-running business logic.
  3. Has exactly one row in stripe_events with processed_at set.

For FastAPI, the equivalent forward target:

stripe listen --forward-to localhost:8000/webhooks/stripe

Integration test skeleton (pytest)

import pytest
from httpx import AsyncClient
import stripe
from unittest.mock import patch

@pytest.mark.asyncio
async def test_duplicate_event_is_idempotent(async_client: AsyncClient):
payload = b'{"id": "evt_test_001", "type": "checkout.session.completed", ...}'
sig = stripe.WebhookSignature.generate_header(
payload, "whsec_test_secret"
)
headers = {"stripe-signature": sig}

r1 = await async_client.post("/webhooks/stripe", content=payload, headers=headers)
r2 = await async_client.post("/webhooks/stripe", content=payload, headers=headers)

assert r1.status_code == 200
assert r2.status_code == 200
# Assert your DB has exactly one processed row

Final Thoughts

The pattern itself is simple — a unique insert, a conflict check, return early. What makes it production-grade is applying it consistently before any business logic runs, being explicit about which failures should trigger a Stripe retry versus which should be swallowed, and handling clock skew before it bites you in a late-night deploy. The bugs I've seen most often aren't in the happy path — they're in the retry path that nobody tested. Lock this down early and you won't be debugging duplicate orders at 2 AM.

Damian Hodgkiss

Damian Hodgkiss

Senior Staff Engineer at Sumo Group, leading development of AppSumo marketplace. Technical solopreneur with 25+ years of experience building SaaS products.

Creating Freedom

Join me on the journey from engineer to solopreneur. Learn how to build profitable SaaS products while keeping your technical edge.

    Proven strategies

    Learn the counterintuitive ways to find and validate SaaS ideas

    Technical insights

    From choosing tech stacks to building your MVP efficiently

    Founder mindset

    Transform from engineer to entrepreneur with practical steps