DH
10 min read

Observability with OpenTelemetry: Distributed Tracing Across Next.js, FastAPI, and Django

Instrument three-tier stacks with OpenTelemetry. Propagate trace context across service boundaries and ship spans to your own collector.

nextjsfastapidjango

Modern stacks are rarely monolithic. If you're running a Next.js frontend talking to a FastAPI service that delegates work to a Django backend, you've got three runtimes, two languages, and a whole lot of surface area for things to go wrong silently. OpenTelemetry gives you the connective tissue to trace a single user request across all of it—without locking yourself into a vendor's proprietary SDK.

This tutorial walks you through instrumenting all three layers with OpenTelemetry, propagating trace context across service boundaries, and shipping spans to a collector you control. It builds on the structured logging guide for this stack—if you've already got correlated logs in place, distributed tracing is the natural next layer.


Why OpenTelemetry and Not Something Proprietary

I've watched teams burn months migrating away from vendor-specific tracing libraries. OpenTelemetry is the CNCF standard—your instrumentation code stays portable whether you're sending data to Jaeger, Tempo, Honeycomb, or Datadog. Pick your backend later; instrument once.

The other reason: OpenTelemetry is now the officially recommended approach in the Next.js documentation, which means framework-level spans (routing, rendering, data fetching) are available out of the box when you wire up the SDK correctly.


The Architecture We're Instrumenting

Browser → Next.js (Node.js) → FastAPI (Python) → Django (Python)

Each hop is an HTTP call. The goal is one coherent trace spanning all three services, visible in a single waterfall view. Each service creates child spans parented to the incoming request's span, all sharing the same traceId.


Key Concepts Before You Start

TermWhat it means in practice
TraceThe full journey of one request across all services
SpanA single unit of work within a trace (e.g. one HTTP handler, one DB query)
Context propagationPassing traceparent/tracestate headers between services so spans link up
CollectorA vendor-neutral proxy that receives spans and forwards them to your backend
SamplerDecides which traces to keep — critical for controlling volume at scale

Step 1: Run a Local Collector and Jaeger

Start with a docker-compose.yml:

version: "3.9"
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
volumes:
- ./otel-collector-config.yaml:/etc/otel/config.yaml
command: ["--config=/etc/otel/config.yaml"]
ports:
- "4317:4317" # gRPC
- "4318:4318" # HTTP

jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # UI
- "14250:14250"

otel-collector-config.yaml — a production-ready config including batching and 10% tail sampling:

receivers:
otlp:
protocols:
grpc:
http:

processors:
batch:
# Send spans in batches of up to 512 or every 5 s, whichever comes first.
# Reduces export round-trips and smooths load on the backend.
send_batch_size: 512
timeout: 5s

probabilistic_sampler:
# Keep 10% of traces. Adjust to taste; 100% is fine locally.
sampling_percentage: 10

exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true

service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, probabilistic_sampler]
exporters: [jaeger]

Local vs production: For local development, set sampling_percentage: 100 so you see every trace. Drop it to 10–20% in staging/production to reduce storage costs without losing statistical coverage.

Run docker compose up -d.


Step 2: Instrument Next.js

Next.js has first-class OpenTelemetry support via its instrumentation file convention. There are two approaches: the quick path with @vercel/otel, and a fully manual SDK setup if you need more control.

Option A — Quick setup with @vercel/otel (recommended)

npm install @vercel/otel @opentelemetry/sdk-logs @opentelemetry/api-logs @opentelemetry/instrumentation

Create instrumentation.ts in your project root (not inside app/ or pages/):

import { registerOTel } from '@vercel/otel'

export function register() {
registerOTel({
serviceName: 'nextjs-frontend',
// Point at your collector when self-hosting
traceExporter: 'otlp',
})
}

This is the approach documented by the Next.js team and gives you framework spans for routing, rendering, and fetch calls without any additional configuration.

Option B — Manual SDK setup (more control)

Use this when you need custom span processors, custom resource attributes, or to mix in your own instrumentation libraries:

npm install @opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-http \
@opentelemetry/resources \
@opentelemetry/semantic-conventions
// instrumentation.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';
import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';

export function register() {
const sdk = new NodeSDK({
resource: new Resource({
[SEMRESATTRS_SERVICE_NAME]: 'nextjs-frontend',
[SEMRESATTRS_SERVICE_VERSION]: process.env.APP_VERSION ?? 'dev',
}),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
}),
instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();
}

Auto-instrumentation handles fetch calls, so outbound requests to FastAPI carry traceparent headers automatically—no manual header injection needed in Node.js.


Step 3: Instrument FastAPI

pip install opentelemetry-sdk \
opentelemetry-instrumentation-fastapi \
opentelemetry-exporter-otlp-proto-http \
opentelemetry-instrumentation-httpx
# tracing.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource, SERVICE_NAME, SERVICE_VERSION
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
import os

def setup_tracing(app):
resource = Resource.create({
SERVICE_NAME: "fastapi-service",
SERVICE_VERSION: os.getenv("APP_VERSION", "dev"),
})
provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(
endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:4318/v1/traces")
)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
FastAPIInstrumentor.instrument_app(app, tracer_provider=provider)
# main.py
from fastapi import FastAPI
from tracing import setup_tracing

app = FastAPI()
setup_tracing(app)

FastAPIInstrumentor reads the incoming traceparent header and creates a child span automatically. The Resource with SERVICE_NAME is important—without it, Jaeger labels your service as unknown_service and traces become hard to filter.


Step 4: Instrument Django

pip install opentelemetry-instrumentation-django \
opentelemetry-exporter-otlp-proto-http

Add to manage.py before execute_from_command_line:

import os
from opentelemetry.instrumentation.django import DjangoInstrumentor
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource, SERVICE_NAME, SERVICE_VERSION
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

resource = Resource.create({
SERVICE_NAME: os.getenv("OTEL_SERVICE_NAME", "django-backend"),
SERVICE_VERSION: os.getenv("APP_VERSION", "dev"),
})
provider = TracerProvider(resource=resource)
provider.add_span_processor(
BatchSpanProcessor(
OTLPSpanExporter(
endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:4318/v1/traces")
)
)
)
trace.set_tracer_provider(provider)
DjangoInstrumentor().instrument()

You can also drive the service name entirely from the environment without any code change:

export OTEL_SERVICE_NAME=django-backend
export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318/v1/traces

Step 5: Inject Context on Outbound HTTP in Python

This is the critical step that trips up more engineers than anything else. When FastAPI calls Django via httpx or requests, you must inject trace context into outbound headers:

import httpx
from opentelemetry.propagate import inject

def call_django(path: str):
headers = {}
inject(headers) # populates traceparent and tracestate
with httpx.Client() as client:
return client.get(f"http://django-backend{path}", headers=headers)

Without inject(), the Django span starts a brand-new trace and you lose end-to-end visibility entirely. The traceparent header encodes the current traceId and spanId; Django's instrumentation reads it on arrival and creates a child span automatically.

If you're using httpx throughout your app, you can also install opentelemetry-instrumentation-httpx and call HTTPXClientInstrumentor().instrument() at startup—it patches all httpx clients to inject context automatically, the same way Node.js auto-instrumentation handles fetch.

from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
HTTPXClientInstrumentor().instrument()

Step 6: Add Custom Spans Around Business Logic

Auto-instrumentation gives you HTTP-level spans. For production debugging you'll want spans around meaningful units of work—a payment processor, a permission check, a slow database query.

Python (FastAPI or Django):

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def process_order(order_id: str):
with tracer.start_as_current_span("process_order") as span:
span.set_attribute("order.id", order_id)
span.set_attribute("order.source", "web")
# ... business logic

TypeScript (Next.js):

import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('nextjs-frontend');

async function fetchUserData(userId: string) {
return tracer.startActiveSpan('fetchUserData', async (span) => {
try {
span.setAttribute('user.id', userId);
const data = await fetch(`/api/users/${userId}`);
return data.json();
} finally {
span.end();
}
});
}

Span attributes follow OpenTelemetry semantic conventions where possible (http.method, db.statement, user.id for custom ones). This makes it far easier to query spans across services consistently.


Verifying the Trace

Open Jaeger at http://localhost:16686, search for service nextjs-frontend, and find a recent trace. You should see three grouped spans sharing the same traceId:

nextjs-frontend [==== GET /api/orders ====]
fastapi-service [== POST /orders ==]
django-backend [= GET /internal/orders =]

Debugging a broken trace:

SymptomLikely cause
Django shows a separate traceIdinject() was not called on the outbound Python HTTP client
FastAPI spans missingsetup_tracing(app) called after route registration
Spans appear but service name is unknown_serviceResource not configured with SERVICE_NAME
No spans at all from Next.jsinstrumentation.ts not in project root, or experimental.instrumentationHook: true missing in older Next.js versions

Production-Ready Collector Config

For a production deployment, extend the collector config with memory limits and OTLP forwarding to a managed backend (e.g. Grafana Cloud, Honeycomb):

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318

processors:
memory_limiter:
# Prevents the collector OOMing under spike load.
check_interval: 1s
limit_mib: 512
spike_limit_mib: 128
batch:
send_batch_size: 512
timeout: 5s
probabilistic_sampler:
# 10% in production; tune based on your request volume and storage budget.
sampling_percentage: 10

exporters:
otlp/honeycomb:
endpoint: api.honeycomb.io:443
headers:
x-honeycomb-team: "${HONEYCOMB_API_KEY}"
# Keep a local Jaeger copy for low-latency developer debugging
jaeger:
endpoint: jaeger:14250
tls:
insecure: true

service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, probabilistic_sampler]
exporters: [jaeger, otlp/honeycomb]

Key points:

  • memory_limiter must come before batch in the processor chain—it protects against spike load causing OOMs.
  • send_batch_size: 512 / timeout: 5s: export a batch when it hits 512 spans or 5 seconds have elapsed, whichever comes first. This is the upstream default and a safe starting point.
  • probabilistic_sampler is in otel/opentelemetry-collector-contrib (not the core image), which is why the docker-compose.yml uses collector-contrib.
  • Swap otlp/honeycomb for any other OTLP-compatible backend—Grafana Tempo, Datadog, New Relic, SigNoz—by changing the exporter block. Your service code never changes.

Environment Variables Cheat Sheet

Keeping config out of code makes deployments cleaner. OpenTelemetry's SDKs respect these natively:

VariableUsed byExample value
OTEL_SERVICE_NAMEAll SDKsdjango-backend
OTEL_EXPORTER_OTLP_ENDPOINTAll SDKshttp://otel-collector:4318
OTEL_TRACES_SAMPLERAll SDKsparentbased_traceidratio
OTEL_TRACES_SAMPLER_ARGAll SDKs0.1 (10%)
OTEL_RESOURCE_ATTRIBUTESAll SDKsdeployment.environment=production

Setting OTEL_TRACES_SAMPLER=parentbased_traceidratio and OTEL_TRACES_SAMPLER_ARG=0.1 applies a 10% head-based sampler at the SDK level—useful when you want to reduce volume before spans even reach the collector.


Production Considerations

  • Replace localhost collector URLs with your actual endpoint using environment variables per service. Never hardcode hostnames.
  • Use BatchSpanProcessor (as shown above) rather than SimpleSpanProcessor; the latter blocks the request thread and will hurt your p99 latency.
  • Choose your sampling strategy deliberately. Head-based sampling (SDK-level) is simpler but can drop interesting rare traces. Tail-based sampling (collector-level, via tailsamplingprocessor) lets you keep all traces with errors regardless of sample rate—worth the added complexity at scale.
  • Add custom spans around business logic using tracer.start_as_current_span() in Python or tracer.startActiveSpan() in TypeScript. HTTP spans alone won't tell you which function is slow.
  • Propagate traceId into your structured logs. If you're already using structured logging across this stack, injecting the current traceId and spanId into every log line lets you jump from a trace waterfall directly to the relevant log lines.

Final Thought

End-to-end distributed tracing across a polyglot stack is one of the highest-leverage things you can do for a production system. You stop guessing where latency lives and start having evidence. OpenTelemetry's instrumentation libraries handle the heavy lifting; your job is wiring the collector, configuring sensible sampling, and remembering to call inject() on outbound Python HTTP requests. That last bit trips up more engineers than anything else—and now you know why.

Damian Hodgkiss

Damian Hodgkiss

Senior Staff Engineer at Sumo Group, leading development of AppSumo marketplace. Technical solopreneur with 25+ years of experience building SaaS products.

Creating Freedom

Join me on the journey from engineer to solopreneur. Learn how to build profitable SaaS products while keeping your technical edge.

    Proven strategies

    Learn the counterintuitive ways to find and validate SaaS ideas

    Technical insights

    From choosing tech stacks to building your MVP efficiently

    Founder mindset

    Transform from engineer to entrepreneur with practical steps