Streaming Structured Output from Anthropic and OpenAI APIs to a Next.js UI
Production pattern for streaming JSON schemas from LLM APIs into a Next.js UI, handling Anthropic and OpenAI differences without the blank-screen wait.
If you've built anything serious with LLM APIs, you'll know the pattern quickly becomes: call the API, wait, render. That works until your users stare at a blank screen for three seconds. Streaming fixes that — but streaming structured output (JSON, typed objects, validated schemas) is where most tutorials fall apart. This one won't.
I'm going to walk you through a practical, copy-paste-ready pattern for streaming structured output from both Anthropic and OpenAI into a Next.js UI — including how to handle the important differences in how each provider enforces JSON schema compliance.
Why Structured Streaming Is Awkward
Plain text streaming is trivial — you pipe tokens to the UI as they arrive. Structured output is harder because JSON is only valid once complete. Stream half a JSON object and you've got a parse error.
Two approaches exist:
- Stream plain text, parse at the end — lowest latency to first token, but no incremental UI updates on the structure itself.
- Stream with a partial-JSON parser — you can update fields as they arrive, giving a genuinely progressive UI.
Both are useful. I'll cover both, and explain when to reach for each.
How Each Provider Handles Structured Output
Before writing a line of code, it's worth understanding where each provider sits on the schema-enforcement spectrum — because they've moved significantly.
Anthropic: Native Structured Outputs (Generally Available)
Anthropic now offers native structured output via the output_config.format parameter (generally available across Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5, and several other models). This uses constrained decoding to guarantee schema compliance — not prompting.
Important nuance: Anthropic's structured outputs are guaranteed to be schema-valid at the end, but the streaming chunks still arrive as raw text fragments — partial JSON tokens. You still need a partial-JSON parser to do anything meaningful during the stream. The schema guarantee only saves you from needing a Zod validation fallback at completion time.
OpenAI: JSON Mode vs. Structured Outputs
OpenAI offers two separate mechanisms:
response_format: { type: 'json_object' }— guarantees valid JSON but does not enforce a specific schema. The model decides the shape.response_format: { type: 'json_schema', json_schema: { ... } }— full structured outputs with schema enforcement, available ongpt-4oand later models.
For production use, prefer json_schema over json_object. Either way, streamed chunks are still raw JSON fragments.
The Stack
- Next.js 14+ with App Router
- Anthropic SDK (
@anthropic-ai/sdk) - OpenAI SDK (
openai) - A Route Handler as the streaming endpoint
partial-jsonfor incremental parsing on the clientReadableStream/TextDecoderin the browser
Server Side: Next.js Route Handler
Create app/api/generate/route.ts. This handler accepts a provider param and streams back a structured response.
Key decisions in this implementation:
- Anthropic's
output_config.formatuses constrained decoding — the completed response is guaranteed schema-valid. Streaming chunks are still raw text fragments. - OpenAI's
json_schemaresponse format withstrict: trueenforces the schema via constrained decoding on their side. More reliable thanjson_objectfor production. - Error encoding sends a structured error object over the stream rather than killing the connection silently — the client component can detect and display it.
- API keys stay server-side only. This route never exposes them to the client.
Client Side: Progressive Rendering with Partial JSON
Install a lightweight partial-JSON parser. The partial-json package (from the promplate project, 241+ stars on GitHub) handles the incomplete-prefix parsing gracefully:
Build your component:
The partial-json library parses whatever well-formed prefix exists in the accumulating string. You get incremental field updates as tokens arrive — summary renders before keyPoints is complete, and keyPoints items appear one by one as their closing quotes arrive.
Approach Comparison: What to Use When
Anthropic output_config | OpenAI json_schema | System prompt only | |
|---|---|---|---|
| Schema guarantee | ✅ Constrained decoding | ✅ Constrained decoding | ❌ Best-effort |
| Streaming support | ✅ Yes (chunks are text fragments) | ✅ Yes (chunks are text fragments) | ✅ Yes |
| Partial JSON during stream | Needs partial-json parser | Needs partial-json parser | Needs partial-json parser |
| Zod validation needed at end? | No (schema enforced) | No (schema enforced) | Yes — always |
| Model availability | Claude Opus/Sonnet/Haiku 4.5+ | gpt-4o, gpt-4o-mini | Any model |
For greenfield projects, use native structured outputs on both providers. Retain a system prompt as a secondary signal — it doesn't hurt, and helps older model versions you might target.
Adding Zod Validation (Optional but Recommended)
Even with native structured outputs, running Zod at the end is a good habit — it gives you a TypeScript type for free and catches any edge cases:
Call validateResult(accumulated) after the while loop ends. If it returns null, you know something went wrong at the provider side (extremely rare with constrained decoding, but worth handling).
Gotchas Worth Knowing
Backpressure. Node.js ReadableStream queues chunks internally. LLM token rates (typically 30–80 tokens/sec) are well within browser buffering limits, so this rarely bites. In high-throughput scenarios (many concurrent connections), check controller.desiredSize before enqueuing — a negative value means the downstream consumer is falling behind.
Validation at the end. Partial rendering optimises UX. Always run schema validation (Zod works well here) on the completed string before using the data for anything consequential like database writes or downstream API calls.
Edge Runtime. Move this route to the Edge Runtime (export const runtime = 'edge') for lower cold-start latency. Both SDKs support it, but audit your node: module imports first — anything that depends on Node.js built-ins like fs or crypto (directly or transitively) will break at the Edge.
Error handling and network drops. A network interruption mid-stream leaves you with broken JSON. The pattern above uses AbortController so the user can cancel cleanly. The try/catch in the streaming loop surfaces network errors to the UI. Combine with an error boundary at the component tree level for unhandled cases.
Streaming vs. tool use. Anthropic's structured output via output_config works with streaming. If you're using tool use (function calling) instead, the streaming pattern is different — you'd listen for input_json_delta events on content_block_delta chunks, not text_delta. Those arrive in the same incremental fashion, just on a different event type.
TypeScript SDK types. The output_config parameter may not be reflected in older versions of the Anthropic SDK's TypeScript types. If you hit a type error, either cast with as any or update to the latest SDK version (npm install @anthropic-ai/sdk@latest).
Closing Thoughts
Streaming structured output sits at an awkward intersection of protocol constraints and UX demands — which is why most implementations either skip the structure or skip the streaming. You don't have to choose.
The pattern above is provider-agnostic, progressively renderable, and straightforward to extend. A few directions worth exploring from here:
- Server-Sent Events (SSE) instead of raw
ReadableStream— gives you named event types and automatic reconnection. The site's SSE in Next.js 15 tutorial covers the mechanics. - Tool use / function calling for multi-step structured extraction, where you stream multiple structured objects in sequence.
- Optimistic UI — render skeleton placeholders for
keyPointsitems while the array is still being streamed, usingparsed.keyPoints?.length ?? 0to size the skeleton.
The foundation above is what I'd reach for in any production feature that needs LLM output to feel fast and be structurally reliable.
Damian Hodgkiss
Senior Staff Engineer at Sumo Group, leading development of AppSumo marketplace. Technical solopreneur with 25+ years of experience building SaaS products.