Feature Flags and Progressive Rollouts in a Django + FastAPI + Next.js Stack

If you've ever pushed a feature straight to production and watched error rates climb, you already understand why feature flags exist. They decouple deployment from release—one of the most valuable architectural decisions in a modern stack.

This tutorial walks through a practical, production-ready implementation across Django + FastAPI + Next.js. It covers the flag store, consistent bucketing, cache invalidation, anonymous users, and the full rollout workflow. It's concrete and copy-paste-ready.

The Architecture at a Glance

In a typical stack, you have three layers with distinct concerns:

Django — primary application server, business logic, auth, admin. Acts as the source of truth for flag configuration.
FastAPI — lightweight service layer for high-throughput endpoints or async work. Evaluates flags and caches results.
Next.js — frontend needing flag state on both server (SSR/RSC) and client.

The cleanest approach is a centralised flag store queried by all three layers. Keep this in PostgreSQL (which you almost certainly have already), backed by an in-memory or Redis cache to avoid hammering the database on every request.

┌──────────────┐        ┌─────────────┐       ┌───────────────┐
│   Next.js    │───────▶│   FastAPI   │──────▶│    Django     │
│  (SSR/RSC)   │        │  (evaluate) │       │  (flag store) │
└──────────────┘        └─────────────┘       └───────┬───────┘
                              │                        │
                              ▼                        ▼
                         ┌─────────┐           ┌────────────┐
                         │  Redis  │           │ PostgreSQL │
                         │ (cache) │           │   (flags)  │
                         └─────────┘           └────────────┘

The Flag Store: Django as the Source of Truth

Define a simple model:

# flags/models.py
from django.db import models

class FeatureFlag(models.Model):
    key = models.SlugField(unique=True)
    enabled = models.BooleanField(default=False)
    rollout_percentage = models.PositiveSmallIntegerField(default=0)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    def __str__(self):
        return self.key

rollout_percentage of 0 means off for everyone; 100 means on for everyone. Anything between enables progressive rollout. Register it in the Django admin for an instant UI.

Adding an Audit Log

In production, you need to know who changed a flag and when. Add a simple history model:

# flags/models.py (extended)
from django.contrib.auth import get_user_model

User = get_user_model()

class FeatureFlagHistory(models.Model):
    flag = models.ForeignKey(FeatureFlag, on_delete=models.CASCADE, related_name='history')
    changed_by = models.ForeignKey(User, null=True, on_delete=models.SET_NULL)
    old_percentage = models.PositiveSmallIntegerField()
    new_percentage = models.PositiveSmallIntegerField()
    changed_at = models.DateTimeField(auto_now_add=True)
    note = models.TextField(blank=True)

    class Meta:
        ordering = ['-changed_at']

Then log changes via a Django signal:

# flags/signals.py
from django.db.models.signals import pre_save
from django.dispatch import receiver
from .models import FeatureFlag, FeatureFlagHistory

@receiver(pre_save, sender=FeatureFlag)
def log_flag_change(sender, instance, **kwargs):
    if not instance.pk:
        return  # New flag, nothing to diff
    try:
        old = FeatureFlag.objects.get(pk=instance.pk)
    except FeatureFlag.DoesNotExist:
        return
    if old.rollout_percentage != instance.rollout_percentage:
        FeatureFlagHistory.objects.create(
            flag=instance,
            changed_by=getattr(instance, '_changed_by', None),
            old_percentage=old.rollout_percentage,
            new_percentage=instance.rollout_percentage,
        )

Set flag._changed_by = request.user in your admin action or API view before saving.

Evaluating Flags: Consistent User Bucketing

Progressive rollouts require the same user to see the same experience across requests. Use a deterministic hash of user ID and flag key:

# flags/utils.py
import hashlib

def is_flag_enabled(flag_key: str, user_id: int | str, rollout_percentage: int) -> bool:
    if rollout_percentage == 0:
        return False
    if rollout_percentage == 100:
        return True

    hash_input = f"{flag_key}:{user_id}".encode()
    hash_int = int(hashlib.md5(hash_input).hexdigest(), 16)
    bucket = hash_int % 100
    return bucket < rollout_percentage

MD5 is fine here—you're bucketing, not doing cryptography. The critical property: flag_key:user_id always hashes to the same bucket, so the same user never flickers between the old and new experience mid-session.

Including flag_key in the hash input is important. Without it, a user in bucket 12 would be in the first 12% of every flag rollout simultaneously—which inflates correlation between experiments and skews any A/B analysis.

Handling Anonymous Users

Not every user has a logged-in ID. For anonymous visitors, you have two solid options:

Option 1: Stable anonymous ID via cookie. Generate a UUID on first visit, persist it in a long-lived cookie (samesite=Lax, httpOnly), and use it as the bucketing key. The experience is consistent across page loads until the cookie is cleared.

// middleware.ts (Next.js)
import { NextRequest, NextResponse } from 'next/server';
import { v4 as uuidv4 } from 'uuid';

export function middleware(request: NextRequest) {
  const response = NextResponse.next();
  if (!request.cookies.get('anon_id')) {
    response.cookies.set('anon_id', uuidv4(), {
      maxAge: 60 * 60 * 24 * 365, // 1 year
      sameSite: 'lax',
      httpOnly: true,
    });
  }
  return response;
}

Pass anon_id as the user_id to your flag evaluation endpoint when no authenticated user exists.

Option 2: Fall back to a default. If anonymous consistency doesn't matter (e.g., you only want to flag logged-in features), simply return false when no user ID is available. Document this explicitly—silent falls-through become bugs when someone assumes anonymous users can see a flag.

The key rule: never mix anonymous IDs and authenticated IDs in the same flag namespace without aliasing them. If a user logs in mid-session, decide explicitly whether they keep their anonymous bucket or inherit their authenticated bucket.

Exposing Flags via FastAPI with Redis Caching

The naïve implementation uses lru_cache, which works but has two problems in production: it never invalidates, and it's process-local (each worker holds a separate copy). The right answer is Redis with a TTL:

# main.py (FastAPI)
import json
import hashlib
import httpx
import redis.asyncio as aioredis
from fastapi import FastAPI, Depends

app = FastAPI()
redis_client = aioredis.from_url("redis://localhost:6379", decode_responses=True)

CACHE_TTL = 30  # seconds

async def get_flag_config(flag_key: str) -> dict:
    cache_key = f"flag:{flag_key}"
    cached = await redis_client.get(cache_key)
    if cached:
        return json.loads(cached)

    async with httpx.AsyncClient() as client:
        response = await client.get(f"http://django-internal/api/flags/{flag_key}/")
        response.raise_for_status()
        data = response.json()

    await redis_client.setex(cache_key, CACHE_TTL, json.dumps(data))
    return data

def is_flag_enabled(flag_key: str, user_id: str, rollout_percentage: int) -> bool:
    if rollout_percentage == 0:
        return False
    if rollout_percentage == 100:
        return True
    hash_int = int(hashlib.md5(f"{flag_key}:{user_id}".encode()).hexdigest(), 16)
    return (hash_int % 100) < rollout_percentage

@app.get("/flags/{flag_key}/evaluate")
async def evaluate_flag(flag_key: str, user_id: str):
    try:
        config = await get_flag_config(flag_key)
    except Exception:
        return {"flag": flag_key, "enabled": False}  # Fail closed
    enabled = is_flag_enabled(flag_key, user_id, config["rollout_percentage"])
    return {"flag": flag_key, "enabled": enabled}

Cache Invalidation: Push vs. Poll

A 30-second TTL means flag changes can take up to 30 seconds to propagate. That's usually acceptable for a gradual rollout, but for an emergency kill-switch you want immediate propagation.

The clean solution: invalidate the Redis key from Django the moment a flag is saved.

# flags/signals.py (extended)
import django_redis
from django.core.cache import cache

@receiver(post_save, sender=FeatureFlag)
def invalidate_flag_cache(sender, instance, **kwargs):
    # Delete from Django's cache (if you use django-redis for the Django layer)
    cache.delete(f"flag:{instance.key}")
    # If FastAPI has its own Redis connection, publish an invalidation event
    # so every FastAPI worker clears its local state

For a multi-process FastAPI deployment, use Redis Pub/Sub to broadcast the invalidation:

# Django side — publish on save
import redis

def invalidate_fastapi_flag_cache(flag_key: str):
    r = redis.Redis.from_url("redis://localhost:6379")
    r.publish("flag_invalidations", flag_key)

# FastAPI side — subscribe and clear
import asyncio

async def listen_for_invalidations():
    pubsub = redis_client.pubsub()
    await pubsub.subscribe("flag_invalidations")
    async for message in pubsub.listen():
        if message["type"] == "message":
            flag_key = message["data"]
            await redis_client.delete(f"flag:{flag_key}")

@app.on_event("startup")
async def startup():
    asyncio.create_task(listen_for_invalidations())

This gives you near-instant propagation (milliseconds) on explicit saves, with TTL as a safety net for anything that slips through.

Consuming Flags in Next.js

Server Components (App Router)

Fetch flags during render so they're baked into the HTML before it ships to the browser:

// lib/flags.ts
export async function getFlagForUser(flagKey: string, userId: string): Promise<boolean> {
  const res = await fetch(
    `${process.env.INTERNAL_API_URL}/flags/${flagKey}/evaluate?user_id=${userId}`,
    { next: { revalidate: 30 } }
  );
  if (!res.ok) return false; // Fail closed
  const data = await res.json();
  return data.enabled ?? false;
}

{ next: { revalidate: 30 } } tells Next.js to re-fetch the flag from FastAPI at most every 30 seconds, consistent with your Redis TTL. Notice return false on error—always fail closed. An unavailable flag service must never break your application.

In a Server Component:

// app/dashboard/page.tsx
import { getFlagForUser } from '@/lib/flags';
import { getCurrentUserId } from '@/lib/auth';

export default async function DashboardPage() {
  const userId = await getCurrentUserId();
  const newDashboardEnabled = await getFlagForUser('new_dashboard', userId);

  return newDashboardEnabled ? <NewDashboard /> : <LegacyDashboard />;
}

Client Components

For flags that need to be available in interactive client components, fetch them once in a Server Component and pass them as props rather than making a second round-trip from the browser:

// app/dashboard/page.tsx (passing flag to client component)
export default async function DashboardPage() {
  const userId = await getCurrentUserId();
  const betaEditorEnabled = await getFlagForUser('beta_editor', userId);

  return <Editor isBetaEnabled={betaEditorEnabled} />;
}

// components/Editor.tsx
'use client';

interface EditorProps {
  isBetaEnabled: boolean;
}

export function Editor({ isBetaEnabled }: EditorProps) {
  return isBetaEnabled ? <BetaEditor /> : <StableEditor />;
}

This keeps flag evaluation entirely server-side, avoiding a client-side fetch waterfall and preventing the flag state from being exposed in the browser network tab.

Anonymous Users in Next.js

Read the anon_id cookie in your server-side flag fetcher and pass it when no authenticated user exists:

// lib/flags.ts (anonymous-aware version)
import { cookies } from 'next/headers';

export async function getFlagForUserOrAnon(flagKey: string, userId?: string): Promise<boolean> {
  const effectiveId = userId ?? cookies().get('anon_id')?.value;
  if (!effectiveId) return false;
  return getFlagForUser(flagKey, effectiveId);
}

Distributed Systems Consistency: What Can Go Wrong

In a single-process monolith, flag evaluation is trivially consistent. In a Django + FastAPI + Next.js stack, you have at least three independent processes, each with its own cache. Three failure modes to be aware of:

1. Cache drift between services. Django returns rollout_percentage: 50, but FastAPI has a stale cache entry showing rollout_percentage: 0. A user who should be in the 50% cohort sees the old experience. Mitigation: keep TTLs short (30s) and use the Pub/Sub invalidation above for explicit changes.

2. Next.js revalidate lag behind FastAPI. Next.js caches the FastAPI response for up to 30 seconds independently of FastAPI's Redis TTL. This means a flag change can take up to 60 seconds to reach the browser in the worst case (FastAPI cache TTL + Next.js revalidation window). For emergency kill-switches, use Next.js on-demand revalidation via revalidateTag triggered from Django's post-save signal.

3. Multiple FastAPI workers holding divergent in-process state. If you use any in-process caching on top of Redis (e.g., a Python dict), each Uvicorn worker holds its own copy. Invalidation via Pub/Sub only reliably clears this if all workers are subscribed. The safest approach: keep all caching in Redis and keep workers stateless.

Progressive Rollout Workflow

Follow this sequence for any non-trivial feature:

Deploy with the flag at 0%. Code is live but dormant. Confirm your monitoring is in place before moving on.
Ramp to 5% and monitor error rates, latency p95, and any feature-specific business metrics.
Step through 10% → 25% → 50% → 100% with deliberate observation gates between each step. Wait at least one full business cycle (e.g., a weekday) before moving past 25% for features with different weekday/weekend usage patterns.
Remove the flag once you've been at 100% for a comfortable window—typically a week. Flag debt is real technical debt.

Kill-Switch Discipline

Any flag should be droppable to 0% in under 30 seconds. That means:

The Django admin update takes effect immediately in PostgreSQL.
The Pub/Sub invalidation clears Redis within milliseconds.
FastAPI's next evaluation reads 0% and returns false.
Next.js gets a fresh result on the next re-render.

Test this process explicitly before you rely on it in an incident. Running a fire drill—"set flag to 0%, verify the feature disappears"—takes five minutes and is worth doing before the 25% gate.

Cleaning Up Flags

Set a calendar reminder when you hit 100%. A reasonable cleanup checklist:

Remove the flag check from application code.
Remove the FeatureFlag row from the database (or mark it archived).
Delete the Redis key.
Update tests that mock the flag.

Flags left in the codebase become load-bearing confusion within months. Engineers start assuming "this flag is always true" without checking, and the abstraction becomes invisible—until someone sets it to 0% by accident.

Django Admin: Flag Management UI

The Django admin gives you an instant management UI with zero extra code. Register the model and add an inline for history:

# flags/admin.py
from django.contrib import admin
from .models import FeatureFlag, FeatureFlagHistory

class FlagHistoryInline(admin.TabularInline):
    model = FeatureFlagHistory
    extra = 0
    readonly_fields = ('changed_by', 'old_percentage', 'new_percentage', 'changed_at', 'note')
    can_delete = False

@admin.register(FeatureFlag)
class FeatureFlagAdmin(admin.ModelAdmin):
    list_display = ('key', 'enabled', 'rollout_percentage', 'updated_at')
    list_editable = ('rollout_percentage',)
    inlines = [FlagHistoryInline]

    def save_model(self, request, obj, form, change):
        obj._changed_by = request.user
        super().save_model(request, obj, form, change)

list_editable = ('rollout_percentage',) means you can update rollout percentages for multiple flags directly from the list view—no modal required, which matters when you're moving fast during an incident.

What This Pattern Buys You

The difference between containing a production incident in under five minutes via a flag versus an emergency deployment is stark. Decoupling deployment from release reduces blast radius, enables proper canary testing, and makes A/B experimentation a natural extension of your workflow.

The implementation above is self-contained and requires no third-party vendor. Layer on targeting rules (flag per user segment or organisation), percentage-based A/B experiments with metrics tracking, or webhook notifications when flags change—once the foundations are solid. Start with the PostgreSQL store, ship it, and iterate from there.