Health Checks, Graceful Shutdown, and SIGTERM Handling for FastAPI and Next.js Containers on ECS

Running containerised services on ECS sounds straightforward until you've watched a deployment drop live traffic, or seen ECS keep routing requests to a container that's been wedged for ten minutes. Getting health checks, graceful shutdown, and SIGTERM handling right is the difference between a deployment that users never notice and one that generates a 3 a.m. incident. This guide gives you working patterns for both FastAPI and Next.js, including the edge cases that actually bite teams in production.

Why This Matters More Than You Think

ECS uses health information in two distinct places: the load balancer uses it to decide whether to route traffic to a target, and the ECS service scheduler uses it to decide whether to replace a task. If either signal is wrong — too slow, too lenient, or ignored — you end up with traffic hitting unhealthy containers or ECS killing containers before they've drained in-flight requests. Neither is acceptable in production.

There are three failure modes worth naming explicitly:

Dropped requests during deployment — ECS deregisters a task from the ALB and sends SIGTERM, but the container exits before the load balancer has finished draining active connections.
Zombie tasks — The application process has deadlocked or crashed internally, but the container is still running, so ECS never replaces it and the ALB keeps routing to it.
False-positive health check failures — The health check fires before the application has finished starting, causing ECS to restart a perfectly healthy container in a restart loop.

All three are preventable with the patterns below.

FastAPI: Health Check Endpoint

Keep the health check endpoint trivially fast. It should not hit the database unless you specifically want to gate traffic on database connectivity. A dedicated /health route that returns immediately is enough for the load balancer.

from fastapi import FastAPI
from fastapi.responses import JSONResponse

app = FastAPI()

@app.get("/health", include_in_schema=False)
async def health() -> JSONResponse:
    return JSONResponse({"status": "ok"})

Why `/health` and `/ready` Should Be Separate

If you want a richer readiness check that includes database connectivity or cache warmth, put it on a separate path such as /ready and wire it only to your internal monitoring — not the ALB health check. Here is why this separation matters in practice:

The ALB health check is a liveness signal. It answers the question: "Is this process alive and capable of handling HTTP traffic?" A simple 200 is sufficient.
A readiness check is a traffic-gating signal. It answers: "Has this container finished initialising and should it receive user traffic?"

Mixing liveness and readiness on a single endpoint that hammers your database is a surprisingly common cascade-failure trigger under load. If your database becomes slow, every container simultaneously starts returning 5xx on the health check, the ALB marks all targets unhealthy, and you have a full outage — caused not by your application, but by your health check strategy.

A safe pattern for readiness with database verification:

from fastapi import FastAPI, status
from fastapi.responses import JSONResponse
import asyncpg  # or your async DB driver

app = FastAPI()

@app.get("/health", include_in_schema=False)
async def health() -> JSONResponse:
    # Fast liveness check — always responds immediately
    return JSONResponse({"status": "ok"})

@app.get("/ready", include_in_schema=False)
async def ready() -> JSONResponse:
    # Readiness check — gate this on internal monitoring only
    try:
        conn = await asyncpg.connect(dsn="...")
        await conn.execute("SELECT 1")
        await conn.close()
        return JSONResponse({"status": "ready"})
    except Exception as exc:
        return JSONResponse(
            {"status": "not_ready", "detail": str(exc)},
            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
        )

Wire /health to the ALB. Wire /ready to a CloudWatch canary or your internal Prometheus scrape — not the ALB target group.

FastAPI: Handling SIGTERM Gracefully

By default, Uvicorn handles SIGTERM, but behaviour depends on how you invoke it. If you run it via a shell script or a process manager inside the container, signals can be swallowed. The safest pattern is to run Uvicorn directly as PID 1 using exec form in your Dockerfile.

# Use exec form — never shell form — so Uvicorn receives SIGTERM directly
ENTRYPOINT ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

The exec form (JSON array) means Docker sends SIGTERM straight to Uvicorn rather than to a shell. That single change fixes the most common graceful-shutdown failure on containerised Python services.

Uvicorn's Graceful-Shutdown Timeout

Uvicorn supports a --timeout-graceful-shutdown flag (verified in the official Uvicorn settings documentation). Set it to at least as long as your ALB deregistration delay. If your ALB deregistration delay is 30 seconds, give Uvicorn at least 30 seconds to finish draining in-flight requests before it exits.

ENTRYPOINT [
  "uvicorn", "app.main:app",
  "--host", "0.0.0.0",
  "--port", "8000",
  "--timeout-graceful-shutdown", "30"
]

You can confirm the flag exists on your installed version with:

uvicorn --help | grep graceful

FastAPI Lifespan for Clean Startup and Shutdown

FastAPI's lifespan context manager (introduced in Starlette and available via FastAPI's lifespan parameter) is the idiomatic place to open and close resources such as database connection pools. This runs after SIGTERM is received and Uvicorn begins its shutdown sequence.

from contextlib import asynccontextmanager
from fastapi import FastAPI

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup: initialise connection pools, warm caches, etc.
    app.state.db_pool = await create_db_pool()
    yield
    # Shutdown: runs when Uvicorn receives SIGTERM and begins draining
    await app.state.db_pool.close()

app = FastAPI(lifespan=lifespan)

The shutdown block of lifespan is guaranteed to run before Uvicorn exits, so long as the process is not sent SIGKILL. This is why your ECS stopTimeout must be long enough to allow the shutdown sequence to complete.

The Shell-Form Trap

Consider this common mistake:

# WRONG — shell form swallows SIGTERM
ENTRYPOINT uvicorn app.main:app --host 0.0.0.0 --port 8000

In shell form, Docker runs /bin/sh -c "uvicorn ...". Uvicorn is a child process of the shell. When ECS sends SIGTERM, the shell receives it — and /bin/sh does not forward signals to its children by default. Uvicorn never sees SIGTERM; ECS waits out stopTimeout, then sends SIGKILL. Every in-flight request is terminated hard.

Next.js: Health Check Endpoint

Next.js App Router makes this trivial with a Route Handler:

// app/api/health/route.ts
import { NextResponse } from 'next/server';

export async function GET() {
  return NextResponse.json({ status: 'ok' });
}

Wire this path to your ALB target group health check. Set the healthy threshold to 2 and the interval to 15 seconds — aggressive enough to detect problems quickly, conservative enough to avoid false positives during cold starts.

Cold-Start Detection and the `startPeriod`

Next.js containers, particularly those using SSR or ISR, can take 10–30 seconds to be ready to serve requests on a cold start. If your ECS container health check fires before the server is listening, ECS counts those failures immediately. Without a startPeriod, three consecutive failures (with default settings) will cause ECS to mark the task as unhealthy and restart it — before it's ever had a chance to start properly.

Set startPeriod in your task definition health check to cover your worst-case cold-start time, plus a margin. A Next.js app that typically starts in 15 seconds should have a startPeriod of at least 30 seconds.

Next.js: Handling SIGTERM

The default next start process handles SIGTERM quietly. The critical step is again your Dockerfile's ENTRYPOINT — use exec form so the process receives signals directly:

ENTRYPOINT ["node", "server.js"]

If you need custom shutdown logic — closing database connections, flushing queues, draining a background job worker — register a handler explicitly:

// server.ts (custom Next.js server)
process.on('SIGTERM', async () => {
  console.log('SIGTERM received, beginning graceful shutdown');

  // Stop accepting new work
  isShuttingDown = true;

  // Wait for in-flight requests to complete (example with a counter)
  while (activeRequests > 0) {
    await new Promise(resolve => setTimeout(resolve, 100));
  }

  // Flush pending queue jobs
  await jobQueue.close();

  // Close DB connections
  await dbPool.end();

  console.log('Graceful shutdown complete');
  process.exit(0);
});

For most Next.js deployments on ECS you won't need a custom server. The default next start handles it adequately as long as your task stop timeout exceeds your ALB deregistration delay. Custom servers are primarily useful when you have background workers, WebSocket connections, or queue consumers running in the same process.

ECS Task Definition: Getting the Timeouts Right

This is where most teams get burned. Three timeout values must be consistent:

Setting	Where	Recommended starting point
Deregistration delay	ALB target group	30 seconds
Health check grace period	ECS service	60 seconds
Task stop timeout	ECS task definition	60 seconds

The task stop timeout must exceed your deregistration delay. ECS sends SIGTERM, waits for stopTimeout, then sends SIGKILL. If stopTimeout is shorter than the deregistration delay, your container is killed whilst the load balancer is still draining it.

{
  "stopTimeout": 60,
  "healthCheck": {
    "command": ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"],
    "interval": 15,
    "timeout": 5,
    "retries": 3,
    "startPeriod": 30
  }
}

The startPeriod gives your container time to initialise before ECS starts counting failed health checks against it. Set it to at least as long as your typical cold-start time.

The Deregistration Delay Race Condition

When ECS decides to stop a task (during a deployment or scale-in), the sequence is:

ECS begins deregistering the task from the ALB target group.
ECS simultaneously sends SIGTERM to the container.
The ALB continues routing in-flight and queued requests to the target during the deregistration delay window.
After the deregistration delay, the ALB stops routing new connections to the target.
ECS waits until stopTimeout elapses, then sends SIGKILL.

If stopTimeout < deregistrationDelay, your container is dead before the ALB has finished draining. The requests that arrived during the gap get TCP RST errors. The fix is simple: always set stopTimeout > deregistrationDelay, with a comfortable margin for your application's shutdown work on top.

stopTimeout = deregistrationDelay + max_expected_shutdown_work_seconds

For most applications: stopTimeout = 30 (drain) + 15 (shutdown work) + 15 (margin) = 60 seconds.

The One Thing Most Teams Skip

Include a container-level health check in the task definition in addition to the ALB health check. The ALB health check only affects routing; the container health check affects whether ECS replaces a stuck task. Without it, a container that's running but internally wedged — a deadlocked thread pool, an exhausted connection pool, a hung background job — keeps receiving traffic indefinitely.

The container health check is your backstop against zombie tasks. ECS uses consecutive failures to mark the task unhealthy and replace it, even if the ALB never noticed anything wrong (because the ALB only checks the TCP connection and HTTP response code, not whether your application is actually making progress).

Quick Reference: What Goes Where

What	FastAPI	Next.js
Liveness health endpoint	`GET /health` → 200	`GET /api/health` → 200
Readiness check (monitoring only)	`GET /ready` → checks DB	Custom Route Handler
SIGTERM handling	Uvicorn via exec form + `lifespan` shutdown	`next start` via exec form + optional `process.on('SIGTERM')`
Graceful-shutdown timeout	`--timeout-graceful-shutdown 30`	Task stop timeout only
Entrypoint form	`["uvicorn", "app.main:app", ...]`	`["node", "server.js"]`

Summary

Get these five things right — exec-form entrypoints, fast dedicated liveness endpoints, separate readiness checks, consistent timeout values (stopTimeout > deregistrationDelay), and container-level health checks — and your ECS deployments will be genuinely zero-downtime. The failure modes are predictable; the fixes are configuration, not code. The teams that still drop traffic at deploy time are almost always missing one of these five.

Health Checks, Graceful Shutdown, and SIGTERM Handling for FastAPI and Next.js Containers on ECS

Why This Matters More Than You Think

FastAPI: Health Check Endpoint

Why `/health` and `/ready` Should Be Separate

FastAPI: Handling SIGTERM Gracefully

Uvicorn's Graceful-Shutdown Timeout

FastAPI Lifespan for Clean Startup and Shutdown

The Shell-Form Trap

Next.js: Health Check Endpoint

Cold-Start Detection and the `startPeriod`

Next.js: Handling SIGTERM

ECS Task Definition: Getting the Timeouts Right

The Deregistration Delay Race Condition

The One Thing Most Teams Skip

Quick Reference: What Goes Where

Summary

Damian Hodgkiss

Creating Freedom

Proven strategies

Technical insights

Founder mindset

Why This Matters More Than You Think

FastAPI: Health Check Endpoint

Why /health and /ready Should Be Separate

FastAPI: Handling SIGTERM Gracefully

Uvicorn's Graceful-Shutdown Timeout

FastAPI Lifespan for Clean Startup and Shutdown

The Shell-Form Trap

Next.js: Health Check Endpoint

Cold-Start Detection and the startPeriod

Next.js: Handling SIGTERM

ECS Task Definition: Getting the Timeouts Right

The Deregistration Delay Race Condition

The One Thing Most Teams Skip

Quick Reference: What Goes Where

Summary

Damian Hodgkiss

Creating Freedom

Proven strategies

Technical insights

Founder mindset

Why `/health` and `/ready` Should Be Separate

Cold-Start Detection and the `startPeriod`