What happens when one MCP server is down
The gateway reaches each MCP server independently, so a failure stays contained to the server that failed: its tools become temporarily unavailable while every healthy server in the gateway keeps working. A down server shows up in exactly three ways:- Its tools drop off the list. When your client asks the gateway for available tools, the gateway returns the tools from every server it can reach and omits the one it can’t.
- A call to it returns a clear error. A tool call to the offline server gets back a plain-language error naming the server, not a hang and not a failure that spreads to your other tools.
- An admin alert is raised. The gateway records an alert naming the failed server and the reason, so an administrator sees the outage on the Alerts page instead of inferring it from a vague client-side error.
The Alerts feed is in-app and visible to roles with the See all alerts capability. End users don’t need it to understand an outage — the
error they get back already names the affected server.
A server that’s unreachable when you connect
When you connect an MCP client, like Claude, to a gateway, MCP Manager walks you through its servers and gathers what each one needs (see Connection Experience). A server that’s down doesn’t stop you connecting: the authorization flow never contacts the server itself, so you can finish connecting. However, when the MCP client connects to the gateway, any offline servers simply contribute no tools, resources, or prompts when your client loads them. Reconnect once it’s healthy and its features join the list. OAuth is the one case where a provider’s availability can matter, and only when you’re adding a new identity. Authorizing a new OAuth identity completes that server’s OAuth authorization with its provider, so if the provider is unreachable right then, you can’t add that identity until it’s back. A saved identity you’ve connected before, or a shared identity an administrator attached, needs no live authorization: MCP Manager uses the stored credential, so you connect even if the server or its OAuth provider is unreachable, and the credential is exercised only on your first call.Connecting records your credentials but doesn’t test them
The authorization flow — where your client sends you to MCP Manager to authorize and bring an identity for each server — records the identities and credentials you provide. It does not connect to the downstream servers or verify that those credentials work at that time. A wrong or expired credential, or a server that’s quietly down, therefore surfaces on the first call that uses it rather than during the connection flow. Completing the flow confirms every server has an identity attached, which is not the same as proving each identity works.A server that goes down during a session
If a server is healthy when you connect but goes down later, the tools you already loaded stay listed and calls to the healthy servers keep working. A call to the offline server comes back with a clear gateway error:MCP Manager couldn’t reach ‘GitHub’. If the issue persists, provide the correlation ID below for further investigation.Your client receives this as an ordinary tool error and can report it or move on, rather than stalling or losing the connection. When the server recovers, the next call to it simply succeeds again.
Why some tools are missing from your gateway
If a gateway shows fewer tools than you expect, a server was most likely unreachable when your client requested the tool list. The gateway builds that list from the servers it can reach at that moment, so a server that was down during connection contributes nothing to it. Reconnect or refresh the connection once the server is healthy and the gateway gathers the list again from all reachable servers, restoring the missing tools. This is also why two people who connect at different times can briefly see different tool counts from the same gateway.When an identity’s credential stops working
A separate case from a plain outage: the server is reachable, but the identity you connected is rejected because its credential no longer works. Two things cause this:- An OAuth token that can’t be refreshed. MCP Manager refreshes a saved identity’s OAuth tokens automatically, so this is uncommon — but if a token expires and genuinely can’t be refreshed, the next call fails authentication.
- A revoked or expired token credential. A header credential such as a GitHub personal access token or an API key doesn’t refresh; once it’s revoked or expires, the server rejects it and it has to be replaced.
MCP Manager needs ‘GitHub’ to be re-authenticated before this request can complete. Visit your gateway settings to reconnect.You fix it by bringing a new identity rather than repairing the old one: in your AI client, disconnect the gateway connector and reconnect it, then add or select a working identity for that server during the connection flow (see Connection Experience). MCP Manager doesn’t re-authorize an identity in place. The rest of the gateway keeps working throughout. The “couldn’t reach” message points to a server that’s down; this one points to a credential you replace by reconnecting — telling you whether to wait out an outage or bring a new identity.
Recovery is automatic — no rebuild required
The gateway doesn’t lock a server out after it fails; it retries on the next request. A downed server therefore recovers on its own — calls succeed again and its tools return the next time a client loads them — with no restart, no remove-and-re-add, and no configuration change on your part. The answer to the most common worry during an outage is simply to wait it out: the rest of the gateway keeps serving, and the server rejoins automatically once it’s healthy.Do you need a separate gateway for an unreliable server?
During an outage, the instinct is often to build a new gateway that leaves the troublesome server out. For a transient outage that’s wasted effort: the gateway already isolates a down server and lets it rejoin on recovery, so a parallel gateway only adds something to maintain and later unwind. Two situations do warrant a deliberate change:- Permanently retiring a server. Remove it from the gateway, or switch off its per-layer
enabledtoggle (Runtime Protections) — a clean kill-switch, checked on every request, that stops the server being offered immediately with nothing deleted. - Isolating a chronically flaky server. If one server is unreliable often enough that its alerts or missing tools disrupt a gateway you depend on, move it to its own gateway. Reach for this when a server is persistently unreliable, not for a one-off the gateway is built to absorb.
At a glance: server-outage scenarios
| Scenario | What you experience | What to do |
|---|---|---|
| A server is down when you connect | You connect to the rest of the gateway; the offline server’s tools are absent | Reconnect once the server is healthy to pick its tools back up |
| A server goes down mid-session | Calls to it return a “couldn’t reach” error; other servers keep working | Wait it out — the next call succeeds once it recovers |
| A gateway is missing tools | A server was unreachable when the tool list was built | Check Alerts, then reconnect once it’s back |
| An identity’s credential stops working | Calls return a “needs to be re-authenticated” error | Reconnect the gateway in your client and bring a new identity |
| A server is permanently gone | — | Remove it or switch off its enabled toggle |
Further reading
Connection Experience
The guided, server-by-server connection flow, and how saved identities are reused.
Alerts
The admin feed where a server-discovery or connection failure is surfaced.
Architecture & Trust
Why the gateway sits in the path of every call, and the latency it adds.
Runtime Protections
The per-layer enabled toggles that switch a server, host, or gateway off instantly.
MCP Gateways
What a gateway is and how it bundles several MCP servers behind one connection.

