Your MCP Server Works Locally. Then Kubernetes Kills the Session.

Posted on February 23, 2026 schedule 7 min read

MCPKubernetesSpring AIGKEJava

Your MCP Server Works Locally. Then Kubernetes Kills the Session.

We’re running a Spring AI MCP server in production on GKE. It serves Claude Code as a client, exposing 60+ tools for managing climbing data — sites, routes, venues, the works. The auth layer is solid: JWT tokens persisted in PostgreSQL, dual validation (HS256 service tokens, RS256 OAuth user tokens), revocation checks on every request.

Then, intermittently, Claude Code starts throwing this:

Session not found: f4456324-79fb-4d84-ab2b-eebf27c5a475

The token is valid. The server is up. The endpoint responds. But the session is gone.

What MCP “Sessions” Actually Are

If you’re deploying MCP servers with the Streamable HTTP transport, you need to understand that there are two completely independent authentication layers at play:

Layer	What it is	Where it lives	Survives restart?
Auth token	JWT Bearer token	PostgreSQL / Redis	Yes
Transport session	MCP protocol state	`ConcurrentHashMap` in JVM heap	No

The auth token is yours. You control its lifecycle, persistence, and revocation. The transport session is the MCP SDK’s internal bookkeeping — it maps a Mcp-Session-Id header to a McpStreamableServerSession object that holds reactive subscriptions, SSE sinks, the protocol state machine, and capability negotiation results.

Here’s the thing: you can’t persist it. It contains live JVM objects — Reactor Sinks, Flux pipelines, thread-bound state. Serializing it to Redis or PostgreSQL isn’t viable (we dig into the details below).

What Actually Causes “Session not found”

Rolling deployment — Kubernetes replaces pods one by one. The new pod has an empty ConcurrentHashMap. Clients holding old session IDs get 404.
Pod restart — OOM kill, liveness probe failure, node preemption. Same result.
Multiple replicas — Request hits pod B, but the session was created on pod A. Without sticky sessions, this fails randomly.

All three are standard Kubernetes behavior. The MCP SDK just wasn’t designed for it yet.

The One-Line Fix

Spring AI’s MCP server auto-configuration supports three transport protocols:

public enum ServerProtocol {
    SSE,          // Legacy Server-Sent Events
    STREAMABLE,   // Stateful Streamable HTTP (default)
    STATELESS     // Stateless Streamable HTTP
}

STATELESS implements a subset of the Streamable HTTP spec where no server-side session state is maintained between requests. Each HTTP POST to /mcp is self-contained — auth via Bearer token, no Mcp-Session-Id header, no session lookup, no ConcurrentHashMap.

The fix:

# Before: stateful sessions, dies on pod restart
spring:
  ai:
    mcp:
      server:
        protocol: STREAMABLE
        streamable-http:
          mcp-endpoint: /mcp
          keep-alive-interval: 30s

# After: stateless, survives anything Kubernetes throws at it
spring:
  ai:
    mcp:
      server:
        protocol: STATELESS
        streamable-http:
          mcp-endpoint: /mcp

No session. No keep-alive-interval (there’s no persistent connection to keep alive). No ConcurrentHashMap. Any pod can serve any request. Rolling deployments are invisible to clients.

What You Lose (And Why It Probably Doesn’t Matter)

Stateless mode has one real limitation: no server-to-client notifications. In stateful mode, the server can push log messages, progress updates, and resource change notifications to the client via SSE. In stateless mode, communication is strictly request-response.

Ask yourself: do your MCP tools call exchange.loggingNotification() or exchange.sendResourceChanged()? If not — and most don’t — you lose nothing.

Our 60+ tools are all request-response: client calls tool, server returns result. No push notifications, no streaming updates, no server-initiated messages. Stateless mode is functionally identical for this workload.

The Auto-Configuration Just Works

If you’re using Spring AI’s annotation-based approach (@McpTool, @McpResource, @McpPrompt), the switch is transparent. Spring AI has parallel auto-configuration classes:

McpServerStreamableHttpWebFluxAutoConfiguration — creates WebFluxStreamableServerTransportProvider (stateful)
McpServerStatelessWebFluxAutoConfiguration — creates WebFluxStatelessServerTransport (stateless)

The annotation scanner has a StatelessServerSpecificationFactoryAutoConfiguration that automatically converts your @McpTool methods to McpStatelessServerFeatures.AsyncToolSpecification beans. Your tool code doesn’t change. Your security WebFilter doesn’t change (it matches on /mcp path regardless of transport type). Your auth token validation doesn’t change.

The Bigger Picture: MCP in Production Is Uncharted Territory

As of early 2026, the MCP ecosystem is moving fast. The Java SDK (v0.17.x) and Spring AI integration (2.0.0-M2) are production-capable but still maturing. A few things we’ve learned deploying to GKE:

Session management is the first thing that breaks. The SDK’s in-memory session store works perfectly for local development (single process, no restarts). It breaks immediately in any environment with rolling deployments or horizontal scaling. The STATELESS transport is the answer, but it’s not the default and the documentation doesn’t warn you about this.

Auth is your responsibility. The MCP spec includes an OAuth flow, but the SDK doesn’t implement server-side auth out of the box. We built a full auth layer: JWT validation, token revocation, rate limiting, dual token types (service accounts for CI, OAuth tokens for interactive use). If you’re exposing MCP over the internet, plan for this from day one.

The SDK has no session expiration. Even in stateful mode, sessions live forever in the ConcurrentHashMap — there’s no idle timeout, no TTL, no eviction. If you stay on STREAMABLE, leaked sessions will accumulate until the pod restarts. This is a known gap.

Clients don’t handle session loss gracefully. When a session disappears (pod restart, deployment), the client gets a 404 and stops working. There’s no automatic re-initialization in Claude Code — you have to restart the MCP connection manually. The Java SDK issue tracker has several reports of this. Going stateless sidesteps the problem entirely.

“Can’t We Just Persist Sessions to Redis?”

If you’ve already persisted auth tokens to a database, the natural instinct is to do the same for transport sessions. We dug into the SDK source to see if this is viable. It isn’t — and here’s why.

A McpStreamableServerSession holds live FluxSink<ServerSentEvent<?>> objects — the actual SSE connections to the client, bound to a specific HTTP response stream on a specific pod. It also holds ConcurrentHashMap<Object, MonoSink<JSONRPCResponse>> for pending request/response pairs. These are Reactor primitives tied to a running JVM — serializing them to Redis is like trying to serialize an open TCP socket.

The session also holds Map<String, McpRequestHandler<?>> — your @McpTool handler functions. These are Java lambdas that could theoretically be re-injected on session restoration, but the SDK doesn’t separate “restorable metadata” from “runtime-only state.” Everything is mixed into one object with no SessionStore abstraction.

We examined the extension points:

McpStreamableServerSession.Factory is pluggable via setSessionFactory(), but it only handles creation. There’s no hook for session lookup — the sessions field is a private ConcurrentHashMap inside WebFluxStreamableServerTransportProvider with no getter or pluggable interface.
WebFluxStreamableServerTransportProvider could be reimplemented entirely (it implements a clean interface), but you’d be rewriting ~500 lines of HTTP handling, SSE management, and session lifecycle — essentially forking the transport layer.
The SDK’s own TODOs (// TODO: review in the context of history storage, // TODO: store message in history) confirm the authors are planning for session persistence, but it’s not there yet. The replay() method returns Flux.empty().

If you genuinely need STREAMABLE in Kubernetes (e.g., for server-push logging), the realistic options today are:

Sticky sessions at the ingress level (sessionAffinity: ClientIP or cookie-based) — simplest, but loses sessions on pod restart
Single replica — no session routing issues, but no HA
Contribute a SessionStore abstraction to the SDK — the interface boundary is clean enough; issue #107 would be the natural starting point

For request-response workloads, none of these are worth the complexity. Go STATELESS.

TL;DR

If you’re…	Do this
Running MCP locally or single-instance	`STREAMABLE` is fine
Deploying to Kubernetes with replicas	Switch to `STATELESS`
Using `@McpTool` annotations	No code changes needed
Calling `exchange.loggingNotification()`	Stay on `STREAMABLE`, accept the session risk
Tempted to persist sessions to Redis	Don’t — they contain non-serializable runtime state

The MCP protocol is one of the most interesting things happening in AI tooling right now. But “works on my machine” and “works in production on Kubernetes” are very different bars. If you’re one of the early teams pushing MCP servers into real multi-pod deployments, save yourself the debugging: go stateless from the start.

Further Reading:

Stack: Spring AI 2.0.0-M2, MCP Java SDK 0.17.1, Spring Boot 4.0.1, GKE with rolling deployments. MCP client: Claude Code (Anthropic CLI).

arrow_back Back to blog