Rate Limiting and Throttling Patterns for FastAPI APIs in Production
Practical rate limiting and throttling strategies for FastAPI—from in-process solutions to distributed Redis setups. Production-tested patterns with working code.
Rate Limiting and Throttling Patterns for FastAPI APIs in Production
Rate limiting feels optional until it isn't. Deploy a FastAPI service, traffic grows, one bad actor hammers an endpoint, and you're explaining an outage. Bolt this on early, not after the fire.
This guide covers practical patterns—from simple in-process solutions to distributed setups—with working code you can adapt today.
Why FastAPI Needs Explicit Rate Limiting
FastAPI provides no built-in throttling. That's a reasonable design choice, but it means you own this problem entirely. Your options split into three tiers:
- In-process (single instance) — fast, zero dependencies, useless at scale
- Shared state via Redis — the right answer for most production deployments
- Upstream (reverse proxy / gateway) — for platform-level enforcement
Know which tier you need before choosing a solution.
Tier 1: In-Process Rate Limiting with SlowAPI
For single-instance deployments or local dev, slowapi integrates cleanly with FastAPI:
The key_func is critical. get_remote_address works locally, but behind a load balancer you'll want to key on X-Forwarded-For or an authenticated user ID:
Limitation: state lives in memory. Two instances maintain separate counters. Don't use this behind a load balancer without moving to shared state.
Tier 2: Redis-Backed Distributed Rate Limiting
This is production-ready. Redis provides atomic operations, TTL-based expiry, and sub-millisecond overhead.
This single change makes your rate limit state shared across all instances.
Custom Middleware for Fine-Grained Control
When slowapi isn't flexible enough—say, you need tiered limits based on subscription level—use custom middleware:
Always include Retry-After and X-RateLimit-* headers. Clients that respect them back off gracefully; those that don't have no excuse.
Algorithm Choice Matters
Most tutorials stop at "use a counter." The algorithm you pick changes behavior significantly:
- Fixed window (shown above): Simple, but allows 2x burst at boundaries
- Sliding window log: Accurate, higher memory cost per user
- Token bucket: Allows controlled bursting; good for occasional spikes
- Leaky bucket: Smoothest output rate; best for protecting downstream services
For most API rate limiting, sliding window or token bucket is the right default. Fixed window is acceptable if you understand and accept boundary burst behavior.
Tier 3: Upstream Enforcement at the Proxy Layer
Application-layer rate limiting is necessary but insufficient. A connection flood still hits your app servers before they respond with 429. Enforce limits at the nginx or API gateway layer too.
Nginx example:
This rejects excess requests before they consume a thread or database connection. Proxy-layer limits handle raw volume; application-layer limits handle per-user logic.
Production Considerations
Exempt your health checks. Rate-limiting load balancer probes causes unnecessary instance cycling.
Log 429s separately. A spike in rate limit hits signals abuse, a client bug, or a misconfigured limit. Make this pattern visible.
Don't rate limit uniformly across endpoints. Your /health, /auth/token, and /upload/large-file endpoints have different risk profiles. Tune them separately.
Test your limits under load before production. A rate limiter never exercised in staging is a liability.
Get this infrastructure right once, and you won't think about it again until your next service. Rate limiting isn't glamorous, but it's the difference between graceful degradation and cascading failure.
Damian Hodgkiss
Senior Staff Engineer at Sumo Group, leading development of AppSumo marketplace. Technical solopreneur with 25+ years of experience building SaaS products.