Gateway Resiliency - MCP Manager

A gateway in MCP Manager usually fronts several MCP servers at once. When one of them goes down — maintenance, a deploy, or a network blip — the gateway keeps serving every other server and contains the problem to the one that’s down. One server’s outage is never a gateway-wide outage, so you don’t need to take anything else offline or rebuild your setup to ride it out.

What happens when one MCP server is down

The gateway reaches each MCP server independently, so a failure stays contained to the server that failed: its tools become temporarily unavailable while every healthy server in the gateway keeps working. A down server shows up in exactly three ways:

Its tools drop off the list. When your client asks the gateway for available tools, the gateway returns the tools from every server it can reach and omits the one it can’t.
A call to it returns a clear error. A tool call to the offline server gets back a plain-language error naming the server, not a hang and not a failure that spreads to your other tools.
An admin alert is raised. The gateway records an alert naming the failed server and the reason, so an administrator sees the outage on the Alerts page instead of inferring it from a vague client-side error.

Here, GitHub and Jira keep answering through the gateway while the offline server is contained, producing a client error and an admin alert.

The Alerts feed is in-app and visible to roles with the See all alerts capability. End users don’t need it to understand an outage — the error they get back already names the affected server.

A server that’s unreachable when you connect

When you connect an MCP client, like Claude, to a gateway, MCP Manager walks you through its servers and gathers what each one needs (see Connection Experience). A server that’s down doesn’t stop you connecting: the authorization flow never contacts the server itself, so you can finish connecting. However, when the MCP client connects to the gateway, any offline servers simply contribute no tools, resources, or prompts when your client loads them. Reconnect once it’s healthy and its features join the list. OAuth is the one case where a provider’s availability can matter, and only when you’re adding a new identity. Authorizing a new OAuth identity completes that server’s OAuth authorization with its provider, so if the provider is unreachable right then, you can’t add that identity until it’s back. A saved identity you’ve connected before, or a shared identity an administrator attached, needs no live authorization: MCP Manager uses the stored credential, so you connect even if the server or its OAuth provider is unreachable, and the credential is exercised only on your first call.

Connecting records your credentials but doesn’t test them

The authorization flow — where your client sends you to MCP Manager to authorize and bring an identity for each server — records the identities and credentials you provide. It does not connect to the downstream servers or verify that those credentials work at that time. A wrong or expired credential, or a server that’s quietly down, therefore surfaces on the first call that uses it rather than during the connection flow. Completing the flow confirms every server has an identity attached, which is not the same as proving each identity works.

A server that goes down during a session

If a server is healthy when you connect but goes down later, the tools you already loaded stay listed and calls to the healthy servers keep working. A call to the offline server comes back with a clear gateway error:

MCP Manager couldn’t reach ‘GitHub’. If the issue persists, provide the correlation ID below for further investigation.

Your client receives this as an ordinary tool error and can report it or move on, rather than stalling or losing the connection. When the server recovers, the next call to it simply succeeds again.

Why some tools are missing from your gateway

If a gateway shows fewer tools than you expect, a server was most likely unreachable when your client requested the tool list. The gateway builds that list from the servers it can reach at that moment, so a server that was down during connection contributes nothing to it. Reconnect or refresh the connection once the server is healthy and the gateway gathers the list again from all reachable servers, restoring the missing tools. This is also why two people who connect at different times can briefly see different tool counts from the same gateway.

If a gateway is missing tools you know it should have, check the Alerts page for a server-discovery failure before changing any configuration — the alert names the server and the reason faster than you can infer it from the client side.

When an identity’s credential stops working

A separate case from a plain outage: the server is reachable, but the identity you connected is rejected because its credential no longer works. Two things cause this:

An OAuth token that can’t be refreshed. MCP Manager refreshes a saved identity’s OAuth tokens automatically, so this is uncommon — but if a token expires and genuinely can’t be refreshed, the next call fails authentication.
A revoked or expired token credential. A header credential such as a GitHub personal access token or an API key doesn’t refresh; once it’s revoked or expires, the server rejects it and it has to be replaced.

Either way the call fails authentication rather than returning an unreachable error, and MCP Manager flags the server as needing re-authentication so your gateway surfaces a reconnect path. For an expired OAuth token, the caller sees:

MCP Manager needs ‘GitHub’ to be re-authenticated before this request can complete. Visit your gateway settings to reconnect.

You fix it by bringing a new identity rather than repairing the old one: in your AI client, disconnect the gateway connector and reconnect it, then add or select a working identity for that server during the connection flow (see Connection Experience). MCP Manager doesn’t re-authorize an identity in place. The rest of the gateway keeps working throughout. The “couldn’t reach” message points to a server that’s down; this one points to a credential you replace by reconnecting — telling you whether to wait out an outage or bring a new identity.

Recovery is automatic — no rebuild required

The gateway doesn’t lock a server out after it fails; it retries on the next request. A downed server therefore recovers on its own — calls succeed again and its tools return the next time a client loads them — with no restart, no remove-and-re-add, and no configuration change on your part. The answer to the most common worry during an outage is simply to wait it out: the rest of the gateway keeps serving, and the server rejoins automatically once it’s healthy.

Do you need a separate gateway for an unreliable server?

During an outage, the instinct is often to build a new gateway that leaves the troublesome server out. For a transient outage that’s wasted effort: the gateway already isolates a down server and lets it rejoin on recovery, so a parallel gateway only adds something to maintain and later unwind. Two situations do warrant a deliberate change:

Permanently retiring a server. Remove it from the gateway, or switch off its per-layer enabled toggle (Runtime Protections) — a clean kill-switch, checked on every request, that stops the server being offered immediately with nothing deleted.
Isolating a chronically flaky server. If one server is unreliable often enough that its alerts or missing tools disrupt a gateway you depend on, move it to its own gateway. Reach for this when a server is persistently unreliable, not for a one-off the gateway is built to absorb.

At a glance: server-outage scenarios

Scenario	What you experience	What to do
A server is down when you connect	You connect to the rest of the gateway; the offline server’s tools are absent	Reconnect once the server is healthy to pick its tools back up
A server goes down mid-session	Calls to it return a “couldn’t reach” error; other servers keep working	Wait it out — the next call succeeds once it recovers
A gateway is missing tools	A server was unreachable when the tool list was built	Check Alerts, then reconnect once it’s back
An identity’s credential stops working	Calls return a “needs to be re-authenticated” error	Reconnect the gateway in your client and bring a new identity
A server is permanently gone	—	Remove it or switch off its `enabled` toggle

In every row, the gateway’s healthy servers keep working, and recovering from an outage needs nothing beyond reconnecting when prompted.

Connection Experience

The guided, server-by-server connection flow, and how saved identities are reused.

Alerts

The admin feed where a server-discovery or connection failure is surfaced.

Architecture & Trust

Why the gateway sits in the path of every call, and the latency it adds.

Runtime Protections

The per-layer enabled toggles that switch a server, host, or gateway off instantly.

MCP Gateways

What a gateway is and how it bundles several MCP servers behind one connection.

​What happens when one MCP server is down

​A server that’s unreachable when you connect

​Connecting records your credentials but doesn’t test them

​A server that goes down during a session

​Why some tools are missing from your gateway

​When an identity’s credential stops working

​Recovery is automatic — no rebuild required

​Do you need a separate gateway for an unreliable server?

​At a glance: server-outage scenarios

​Further reading

Connection Experience

Alerts

Architecture & Trust

Runtime Protections

MCP Gateways

What happens when one MCP server is down

A server that’s unreachable when you connect

Connecting records your credentials but doesn’t test them

A server that goes down during a session

Why some tools are missing from your gateway

When an identity’s credential stops working

Recovery is automatic — no rebuild required

Do you need a separate gateway for an unreliable server?

At a glance: server-outage scenarios

Further reading