DH
11 min read

Streaming Structured Output from Anthropic and OpenAI APIs to a Next.js UI

Production pattern for streaming JSON schemas from LLM APIs into a Next.js UI, handling Anthropic and OpenAI differences without the blank-screen wait.

nextjsaitypescript

If you've built anything serious with LLM APIs, you'll know the pattern quickly becomes: call the API, wait, render. That works until your users stare at a blank screen for three seconds. Streaming fixes that — but streaming structured output (JSON, typed objects, validated schemas) is where most tutorials fall apart. This one won't.

I'm going to walk you through a practical, copy-paste-ready pattern for streaming structured output from both Anthropic and OpenAI into a Next.js UI — including how to handle the important differences in how each provider enforces JSON schema compliance.


Why Structured Streaming Is Awkward

Plain text streaming is trivial — you pipe tokens to the UI as they arrive. Structured output is harder because JSON is only valid once complete. Stream half a JSON object and you've got a parse error.

Two approaches exist:

  1. Stream plain text, parse at the end — lowest latency to first token, but no incremental UI updates on the structure itself.
  2. Stream with a partial-JSON parser — you can update fields as they arrive, giving a genuinely progressive UI.

Both are useful. I'll cover both, and explain when to reach for each.


How Each Provider Handles Structured Output

Before writing a line of code, it's worth understanding where each provider sits on the schema-enforcement spectrum — because they've moved significantly.

Anthropic: Native Structured Outputs (Generally Available)

Anthropic now offers native structured output via the output_config.format parameter (generally available across Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5, and several other models). This uses constrained decoding to guarantee schema compliance — not prompting.

const response = await anthropic.messages.create({
model: 'claude-opus-4-5',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }],
output_config: {
format: {
type: 'json_schema',
schema: {
type: 'object',
properties: {
summary: { type: 'string' },
keyPoints: { type: 'array', items: { type: 'string' } },
sentiment: { type: 'string', enum: ['positive', 'neutral', 'negative'] },
},
required: ['summary', 'keyPoints', 'sentiment'],
},
},
},
});

Important nuance: Anthropic's structured outputs are guaranteed to be schema-valid at the end, but the streaming chunks still arrive as raw text fragments — partial JSON tokens. You still need a partial-JSON parser to do anything meaningful during the stream. The schema guarantee only saves you from needing a Zod validation fallback at completion time.

OpenAI: JSON Mode vs. Structured Outputs

OpenAI offers two separate mechanisms:

  • response_format: { type: 'json_object' } — guarantees valid JSON but does not enforce a specific schema. The model decides the shape.
  • response_format: { type: 'json_schema', json_schema: { ... } } — full structured outputs with schema enforcement, available on gpt-4o and later models.

For production use, prefer json_schema over json_object. Either way, streamed chunks are still raw JSON fragments.


The Stack

  • Next.js 14+ with App Router
  • Anthropic SDK (@anthropic-ai/sdk)
  • OpenAI SDK (openai)
  • A Route Handler as the streaming endpoint
  • partial-json for incremental parsing on the client
  • ReadableStream / TextDecoder in the browser

Server Side: Next.js Route Handler

Create app/api/generate/route.ts. This handler accepts a provider param and streams back a structured response.

import Anthropic from '@anthropic-ai/sdk';
import OpenAI from 'openai';
import { NextRequest } from 'next/server';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Shared schema definition — used by both providers
const OUTPUT_SCHEMA = {
type: 'object' as const,
properties: {
summary: { type: 'string' },
keyPoints: { type: 'array', items: { type: 'string' } },
sentiment: { type: 'string', enum: ['positive', 'neutral', 'negative'] },
},
required: ['summary', 'keyPoints', 'sentiment'],
};

// Fallback system prompt for providers without native schema enforcement
const SYSTEM_PROMPT = `You are a product analyst. Always respond with valid JSON matching this schema:
{ "summary": string, "keyPoints": string[], "sentiment": "positive" | "neutral" | "negative" }
Return only valid JSON with no additional text.`;

export async function POST(req: NextRequest) {
const { prompt, provider } = await req.json();

const stream = new ReadableStream({
async start(controller) {
const encode = (text: string) =>
controller.enqueue(new TextEncoder().encode(text));

try {
if (provider === 'anthropic') {
// Use native structured outputs with output_config
const response = await anthropic.messages.stream({
model: 'claude-opus-4-5',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }],
// @ts-expect-error — output_config may not yet be reflected in older SDK types
output_config: {
format: {
type: 'json_schema',
schema: OUTPUT_SCHEMA,
},
},
});

for await (const chunk of response) {
if (
chunk.type === 'content_block_delta' &&
chunk.delta.type === 'text_delta'
) {
encode(chunk.delta.text);
}
}
} else {
// OpenAI: use json_schema for full schema enforcement
const response = await openai.chat.completions.create({
model: 'gpt-4o',
stream: true,
response_format: {
type: 'json_schema',
json_schema: {
name: 'analysis_result',
strict: true,
schema: OUTPUT_SCHEMA,
},
},
messages: [
{ role: 'system', content: SYSTEM_PROMPT },
{ role: 'user', content: prompt },
],
});

for await (const chunk of response) {
const delta = chunk.choices[0]?.delta?.content ?? '';
if (delta) encode(delta);
}
}
} catch (err) {
// Encode the error as a JSON error object so the client can handle it
encode(JSON.stringify({ error: err instanceof Error ? err.message : 'Stream error' }));
} finally {
controller.close();
}
},
});

return new Response(stream, {
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'X-Content-Type-Options': 'nosniff',
},
});
}

Key decisions in this implementation:

  • Anthropic's output_config.format uses constrained decoding — the completed response is guaranteed schema-valid. Streaming chunks are still raw text fragments.
  • OpenAI's json_schema response format with strict: true enforces the schema via constrained decoding on their side. More reliable than json_object for production.
  • Error encoding sends a structured error object over the stream rather than killing the connection silently — the client component can detect and display it.
  • API keys stay server-side only. This route never exposes them to the client.

Client Side: Progressive Rendering with Partial JSON

Install a lightweight partial-JSON parser. The partial-json package (from the promplate project, 241+ stars on GitHub) handles the incomplete-prefix parsing gracefully:

npm install partial-json

Build your component:

'use client';

import { useState, useRef } from 'react';
import { parse } from 'partial-json';

interface AnalysisResult {
summary?: string;
keyPoints?: string[];
sentiment?: 'positive' | 'neutral' | 'negative';
error?: string;
}

const sentimentColour: Record<string, string> = {
positive: '#16a34a',
neutral: '#ca8a04',
negative: '#dc2626',
};

export default function AnalysisStream() {
const [raw, setRaw] = useState('');
const [parsed, setParsed] = useState<AnalysisResult>({});
const [isStreaming, setIsStreaming] = useState(false);
const abortRef = useRef<AbortController | null>(null);

async function runAnalysis(provider: 'anthropic' | 'openai') {
// Cancel any in-flight request
abortRef.current?.abort();
abortRef.current = new AbortController();

setRaw('');
setParsed({});
setIsStreaming(true);

try {
const res = await fetch('/api/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt: 'Analyse our Q3 growth strategy', provider }),
signal: abortRef.current.signal,
});

if (!res.ok) {
throw new Error(`HTTP ${res.status}: ${res.statusText}`);
}

const reader = res.body!.getReader();
const decoder = new TextDecoder();
let accumulated = '';

while (true) {
const { done, value } = await reader.read();
if (done) break;

accumulated += decoder.decode(value, { stream: true });
setRaw(accumulated);

try {
const partial = parse(accumulated);
// Detect server-side error object
if (partial?.error) {
setParsed({ error: partial.error });
break;
}
setParsed(partial);
} catch {
// Not enough tokens yet — keep accumulating
}
}
} catch (err) {
if (err instanceof Error && err.name !== 'AbortError') {
setParsed({ error: err.message });
}
} finally {
setIsStreaming(false);
}
}

function cancel() {
abortRef.current?.abort();
setIsStreaming(false);
}

return (
<div style={{ fontFamily: 'sans-serif', maxWidth: 640, margin: '0 auto', padding: 24 }}>
<div style={{ display: 'flex', gap: 8, marginBottom: 16 }}>
<button
onClick={() => runAnalysis('anthropic')}
disabled={isStreaming}
style={{ padding: '8px 16px' }}
>
Run with Anthropic
</button>
<button
onClick={() => runAnalysis('openai')}
disabled={isStreaming}
style={{ padding: '8px 16px' }}
>
Run with OpenAI
</button>
{isStreaming && (
<button onClick={cancel} style={{ padding: '8px 16px', color: '#dc2626' }}>
Cancel
</button>
)}
</div>

{parsed.error && (
<p style={{ color: '#dc2626' }}>Error: {parsed.error}</p>
)}

{parsed.summary && (
<p><strong>Summary:</strong> {parsed.summary}</p>
)}

{parsed.keyPoints && parsed.keyPoints.length > 0 && (
<ul>
{parsed.keyPoints.map((point, i) => (
<li key={i}>{point}</li>
))}
</ul>
)}

{parsed.sentiment && (
<p>
Sentiment:{' '}
<span style={{ color: sentimentColour[parsed.sentiment] ?? '#000', fontWeight: 600 }}>
{parsed.sentiment}
</span>
</p>
)}

{isStreaming && <p style={{ opacity: 0.5 }}>Streaming…</p>}

<details style={{ marginTop: 16 }}>
<summary style={{ cursor: 'pointer', opacity: 0.5, fontSize: 12 }}>
Raw stream ({raw.length} chars)
</summary>
<pre style={{ opacity: 0.4, fontSize: 11, overflowX: 'auto' }}>{raw}</pre>
</details>
</div>
);
}

The partial-json library parses whatever well-formed prefix exists in the accumulating string. You get incremental field updates as tokens arrive — summary renders before keyPoints is complete, and keyPoints items appear one by one as their closing quotes arrive.


Approach Comparison: What to Use When

Anthropic output_configOpenAI json_schemaSystem prompt only
Schema guarantee✅ Constrained decoding✅ Constrained decoding❌ Best-effort
Streaming support✅ Yes (chunks are text fragments)✅ Yes (chunks are text fragments)✅ Yes
Partial JSON during streamNeeds partial-json parserNeeds partial-json parserNeeds partial-json parser
Zod validation needed at end?No (schema enforced)No (schema enforced)Yes — always
Model availabilityClaude Opus/Sonnet/Haiku 4.5+gpt-4o, gpt-4o-miniAny model

For greenfield projects, use native structured outputs on both providers. Retain a system prompt as a secondary signal — it doesn't hurt, and helps older model versions you might target.


Adding Zod Validation (Optional but Recommended)

Even with native structured outputs, running Zod at the end is a good habit — it gives you a TypeScript type for free and catches any edge cases:

import { z } from 'zod';

const AnalysisSchema = z.object({
summary: z.string(),
keyPoints: z.array(z.string()),
sentiment: z.enum(['positive', 'neutral', 'negative']),
});

type Analysis = z.infer<typeof AnalysisSchema>;

// After stream completes, in your component:
function validateResult(raw: string): Analysis | null {
try {
return AnalysisSchema.parse(JSON.parse(raw));
} catch {
return null;
}
}

Call validateResult(accumulated) after the while loop ends. If it returns null, you know something went wrong at the provider side (extremely rare with constrained decoding, but worth handling).


Gotchas Worth Knowing

Backpressure. Node.js ReadableStream queues chunks internally. LLM token rates (typically 30–80 tokens/sec) are well within browser buffering limits, so this rarely bites. In high-throughput scenarios (many concurrent connections), check controller.desiredSize before enqueuing — a negative value means the downstream consumer is falling behind.

Validation at the end. Partial rendering optimises UX. Always run schema validation (Zod works well here) on the completed string before using the data for anything consequential like database writes or downstream API calls.

Edge Runtime. Move this route to the Edge Runtime (export const runtime = 'edge') for lower cold-start latency. Both SDKs support it, but audit your node: module imports first — anything that depends on Node.js built-ins like fs or crypto (directly or transitively) will break at the Edge.

Error handling and network drops. A network interruption mid-stream leaves you with broken JSON. The pattern above uses AbortController so the user can cancel cleanly. The try/catch in the streaming loop surfaces network errors to the UI. Combine with an error boundary at the component tree level for unhandled cases.

Streaming vs. tool use. Anthropic's structured output via output_config works with streaming. If you're using tool use (function calling) instead, the streaming pattern is different — you'd listen for input_json_delta events on content_block_delta chunks, not text_delta. Those arrive in the same incremental fashion, just on a different event type.

TypeScript SDK types. The output_config parameter may not be reflected in older versions of the Anthropic SDK's TypeScript types. If you hit a type error, either cast with as any or update to the latest SDK version (npm install @anthropic-ai/sdk@latest).


Closing Thoughts

Streaming structured output sits at an awkward intersection of protocol constraints and UX demands — which is why most implementations either skip the structure or skip the streaming. You don't have to choose.

The pattern above is provider-agnostic, progressively renderable, and straightforward to extend. A few directions worth exploring from here:

  • Server-Sent Events (SSE) instead of raw ReadableStream — gives you named event types and automatic reconnection. The site's SSE in Next.js 15 tutorial covers the mechanics.
  • Tool use / function calling for multi-step structured extraction, where you stream multiple structured objects in sequence.
  • Optimistic UI — render skeleton placeholders for keyPoints items while the array is still being streamed, using parsed.keyPoints?.length ?? 0 to size the skeleton.

The foundation above is what I'd reach for in any production feature that needs LLM output to feel fast and be structurally reliable.

Damian Hodgkiss

Damian Hodgkiss

Senior Staff Engineer at Sumo Group, leading development of AppSumo marketplace. Technical solopreneur with 25+ years of experience building SaaS products.

Creating Freedom

Join me on the journey from engineer to solopreneur. Learn how to build profitable SaaS products while keeping your technical edge.

    Proven strategies

    Learn the counterintuitive ways to find and validate SaaS ideas

    Technical insights

    From choosing tech stacks to building your MVP efficiently

    Founder mindset

    Transform from engineer to entrepreneur with practical steps