Log in

We carry the traffic. You ship.

SSE / streaming proxy for LLMs

Stream your model’s tokens through one endpoint.

Point your streaming model or inference target at EchoRelay. We forward its SSE or chunked output to your caller as it’s produced — incremental, not buffered — while auth and rate-limits stay at the edge.

Why EchoRelay

Less to build, less to run.

Chunks, as they’re produced

SSE or chunked transfer, passed straight through to your caller token by token — no waiting for the whole response.

Auth & limits at the edge

Your model stays behind us: we authenticate the caller and apply rate limits before the stream opens.

Set up by your agent, over MCP

Describe the streaming endpoint to your AI agent — it speaks EchoRelay’s MCP surface and ships the config.

Questions

Answered, honestly.

What is an SSE proxy for LLM streaming?
It is a layer between your callers and your model that forwards a streaming (SSE or chunked) response through to the caller as the model produces it, while handling authentication and rate-limiting at the edge.
Does the proxy buffer the whole response?
No. Chunks are forwarded as your target emits them, so the caller sees tokens as they’re generated rather than waiting for the full response.
Does it work for non-LLM streams?
Yes. Any SSE or chunked HTTP response works — LLM token streaming is the common case, but long-running inference and progress streams behave the same way.

Connect everything. Build nothing.

No credit card required.

Currency: