Does EchoRelay buffer the whole response before sending it?

No. Chunks are forwarded as your target produces them, so your caller sees output as it is generated rather than waiting for the full response.

What happens if a stream is cut short?

Your caller keeps everything delivered up to the cut, and billing only counts what was actually streamed back. Heartbeats are not charged.

Can a streaming endpoint fan out to several targets?

No — a streaming endpoint carries a single stream target, for clear and predictable behaviour. Delivering one request to many targets is a separate shape.

Is streaming only for LLMs?

No. Any SSE or chunked HTTP response works. LLM token streaming is the common case, but long-running inference and progress streams behave the same way.

Streaming

Stream it straight back.

Point a streaming target at EchoRelay and we forward its SSE or chunked response to your caller, chunk by chunk — built for LLM token streaming.

Built for AI

Tokens, chunk by chunk.

SSE & chunked, passed through.

Server-sent events or chunked transfer, forwarded to your caller as your target produces them — no waiting for the whole response.

Made for model output.

Stream an LLM’s tokens or a long-running inference job straight to the caller — incremental, not buffered to the end.

Configured by your agent, over MCP.

Set up the streaming endpoint with a sentence to your agent — it speaks EchoRelay’s MCP surface and ships the config for you.

One stream per endpoint.

A streaming endpoint carries a single stream target — clear, predictable behaviour.

How it works

Open, forward, stream.

Your caller opens a request

A client hits your streaming endpoint and holds the connection open, waiting to receive.
We authenticate and open the stream

We verify your key and apply rate limits at the edge, then open the connection to your streaming target.
Chunks stream straight back

As your target emits SSE or chunked output, we forward each chunk to your caller as it arrives — incremental, not buffered to the end.

Streaming runs through the same durable relay as everything else — your caller gets the stream while we keep auth, validation, and rate limits at the edge. A stream is billed by the size of the response it streams back — heartbeats are free, and a cut-short stream only bills for what was delivered.

Questions

Streaming, answered.

Does EchoRelay buffer the whole response before sending it?: No. Chunks are forwarded as your target produces them, so your caller sees output as it is generated rather than waiting for the full response.
What happens if a stream is cut short?: Your caller keeps everything delivered up to the cut, and billing only counts what was actually streamed back. Heartbeats are not charged.
Can a streaming endpoint fan out to several targets?: No — a streaming endpoint carries a single stream target, for clear and predictable behaviour. Delivering one request to many targets is a separate shape.
Is streaming only for LLMs?: No. Any SSE or chunked HTTP response works. LLM token streaming is the common case, but long-running inference and progress streams behave the same way.

Not the right shape? To deliver one request to many targets at once, use Fan-out. To take inbound traffic while keeping your origin off the public internet, use Origin shield.

Put us between your model and your users.

Start free Fan-out

No credit card required.