Rate Limiting and Throttling Patterns for FastAPI APIs in Production

Rate limiting feels optional until it isn't. Deploy a FastAPI service, traffic grows, one bad actor hammers an endpoint, and you're explaining an outage. Bolt this on early, not after the fire.

This guide covers practical patterns—from simple in-process solutions to distributed setups—with working code you can adapt today.

Why FastAPI Needs Explicit Rate Limiting

FastAPI provides no built-in throttling. That's a reasonable design choice, but it means you own this problem entirely. Your options split into three tiers:

In-process (single instance) — fast, zero dependencies, useless at scale
Shared state via Redis — the right answer for most production deployments
Upstream (reverse proxy / gateway) — for platform-level enforcement

Know which tier you need before choosing a solution.

Tier 1: In-Process Rate Limiting with SlowAPI

For single-instance deployments or local dev, slowapi integrates cleanly with FastAPI:

pip install slowapi

from fastapi import FastAPI, Request
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

limiter = Limiter(key_func=get_remote_address)
app = FastAPI()
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)

@app.get("/search")
@limiter.limit("30/minute")
async def search(request: Request, q: str):
    return {"query": q}

The key_func is critical. get_remote_address works locally, but behind a load balancer you'll want to key on X-Forwarded-For or an authenticated user ID:

def get_user_id(request: Request) -> str:
    return request.headers.get("X-User-ID", get_remote_address(request))

Limitation: state lives in memory. Two instances maintain separate counters. Don't use this behind a load balancer without moving to shared state.

Tier 2: Redis-Backed Distributed Rate Limiting

This is production-ready. Redis provides atomic operations, TTL-based expiry, and sub-millisecond overhead.

pip install slowapi redis

from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(
    key_func=get_remote_address,
    storage_uri="redis://localhost:6379"
)

This single change makes your rate limit state shared across all instances.

Custom Middleware for Fine-Grained Control

When slowapi isn't flexible enough—say, you need tiered limits based on subscription level—use custom middleware:

import redis.asyncio as aioredis
from fastapi import FastAPI, Request, HTTPException

app = FastAPI()
redis_client = aioredis.from_url("redis://localhost:6379")

RATE_LIMITS = {
    "free": (60, 60),      # 60 requests per 60 seconds
    "pro": (600, 60),
    "enterprise": (6000, 60),
}

@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    user_tier = request.headers.get("X-User-Tier", "free")
    user_id = request.headers.get("X-User-ID", request.client.host)
    
    limit, window = RATE_LIMITS.get(user_tier, RATE_LIMITS["free"])
    key = f"ratelimit:{user_tier}:{user_id}"
    
    current = await redis_client.incr(key)
    if current == 1:
        await redis_client.expire(key, window)
    
    if current > limit:
        raise HTTPException(
            status_code=429,
            detail="Rate limit exceeded",
            headers={"Retry-After": str(window)}
        )
    
    response = await call_next(request)
    response.headers["X-RateLimit-Limit"] = str(limit)
    response.headers["X-RateLimit-Remaining"] = str(max(0, limit - current))
    return response

Always include Retry-After and X-RateLimit-* headers. Clients that respect them back off gracefully; those that don't have no excuse.

Algorithm Choice Matters

Most tutorials stop at "use a counter." The algorithm you pick changes behavior significantly:

Fixed window (shown above): Simple, but allows 2x burst at boundaries
Sliding window log: Accurate, higher memory cost per user
Token bucket: Allows controlled bursting; good for occasional spikes
Leaky bucket: Smoothest output rate; best for protecting downstream services

For most API rate limiting, sliding window or token bucket is the right default. Fixed window is acceptable if you understand and accept boundary burst behavior.

Tier 3: Upstream Enforcement at the Proxy Layer

Application-layer rate limiting is necessary but insufficient. A connection flood still hits your app servers before they respond with 429. Enforce limits at the nginx or API gateway layer too.

Nginx example:

limit_req_zone $binary_remote_addr zone=api:10m rate=100r/m;

location /api/ {
    limit_req zone=api burst=20 nodelay;
    proxy_pass http://fastapi_backend;
}

This rejects excess requests before they consume a thread or database connection. Proxy-layer limits handle raw volume; application-layer limits handle per-user logic.

Production Considerations

Exempt your health checks. Rate-limiting load balancer probes causes unnecessary instance cycling.

Log 429s separately. A spike in rate limit hits signals abuse, a client bug, or a misconfigured limit. Make this pattern visible.

Don't rate limit uniformly across endpoints. Your /health, /auth/token, and /upload/large-file endpoints have different risk profiles. Tune them separately.

Test your limits under load before production. A rate limiter never exercised in staging is a liability.

Get this infrastructure right once, and you won't think about it again until your next service. Rate limiting isn't glamorous, but it's the difference between graceful degradation and cascading failure.

Rate Limiting and Throttling Patterns for FastAPI APIs in Production

Why FastAPI Needs Explicit Rate Limiting

Tier 1: In-Process Rate Limiting with SlowAPI

Tier 2: Redis-Backed Distributed Rate Limiting

Custom Middleware for Fine-Grained Control

Algorithm Choice Matters

Tier 3: Upstream Enforcement at the Proxy Layer

Production Considerations

Damian Hodgkiss

Creating Freedom

Proven strategies

Technical insights

Founder mindset