Graceful Shutdown

When a deployment rolls or a pod is rescheduled, your application receives a SIGTERM and is expected to stop. If it stops instantly, any requests being processed are dropped and clients see errors. Graceful shutdown keeps the application alive just long enough to finish in-flight requests while refusing new ones, turning a noisy redeploy into a clean one. Spring Boot makes this a one-line setting.

Enabling graceful shutdown

Set the web server to drain requests on shutdown and give it a timeout budget.

server:
  shutdown: graceful          # default is 'immediate'
spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s   # max time to wait for in-flight work

When the JVM receives SIGTERM, Spring Boot:

Stops the web server from accepting new connections.
Lets in-flight requests run to completion, up to timeout-per-shutdown-phase.
Closes the application context (datasources, executors, message listeners).
Exits.

Output (console on SIGTERM):

2026-06-13T10:40:12.118  INFO  o.s.b.w.e.tomcat.GracefulShutdown : Commencing graceful shutdown. Waiting for active requests to complete
2026-06-13T10:40:13.402  INFO  o.s.b.w.e.tomcat.GracefulShutdown : Graceful shutdown complete

If requests are still running when the timeout elapses, the server proceeds with shutdown anyway and logs that some requests were not completed.

Note: Graceful shutdown is supported on all embedded servers — Tomcat, Jetty, Undertow, and Netty (for WebFlux). The behaviour and property are identical across them.

Understanding the timeout

spring.lifecycle.timeout-per-shutdown-phase is the grace budget per lifecycle phase, not just the web server. Set it longer than your slowest legitimate request but shorter than the platform’s hard-kill window, or the orchestrator will SIGKILL the process mid-drain.

Property	Default	Role
`server.shutdown`	`immediate`	`graceful` enables request draining
`spring.lifecycle.timeout-per-shutdown-phase`	`30s`	max wait for each shutdown phase

SIGTERM handling in containers

In a container, the JVM is usually PID 1, and the runtime delivers SIGTERM on docker stop or a Kubernetes pod termination. Spring Boot registers a JVM shutdown hook that triggers the graceful sequence, so as long as SIGTERM reaches the Java process, draining happens automatically.

A common mistake is launching the app through a shell (ENTRYPOINT java -jar app.jar written in shell form), which makes the shell PID 1 and may not forward SIGTERM to Java. Use the exec form so the JVM is PID 1:

# Correct: exec form — java is PID 1 and receives SIGTERM directly
ENTRYPOINT ["java", "-jar", "app.jar"]

# Risky: shell form — the shell is PID 1 and may swallow the signal
# ENTRYPOINT java -jar app.jar

See Dockerizing Spring Boot for the full image setup.

Kubernetes coordination

Graceful shutdown alone isn’t enough on Kubernetes because of a race: when a pod is deleted, Kubernetes simultaneously (a) sends SIGTERM and (b) removes the pod from Service endpoints. For a brief moment the kube-proxy may still route new traffic to a pod that has already stopped accepting connections, producing dropped requests.

The standard fix is a small preStop sleep that delays SIGTERM until endpoint removal has propagated, plus a terminationGracePeriodSeconds larger than the Spring shutdown timeout.

spec:
  terminationGracePeriodSeconds: 45   # > timeout-per-shutdown-phase (30s)
  containers:
    - name: order-service
      lifecycle:
        preStop:
          exec:
            command: ["sh", "-c", "sleep 5"]   # let endpoint removal propagate
      readinessProbe:
        httpGet: { path: /actuator/health/readiness, port: 8080 }

Spring Boot complements this: during shutdown it flips the readiness state to OUT_OF_SERVICE, so a probing Kubernetes marks the pod not-ready and stops sending traffic even before the preStop delay finishes. Enable the availability probes:

management:
  endpoint:
    health:
      probes:
        enabled: true

Warning: Make sure terminationGracePeriodSeconds (Kubernetes) is comfortably larger than timeout-per-shutdown-phase (Spring), accounting for the preStop sleep. If the platform’s kill window is shorter, Kubernetes SIGKILLs the JVM mid-drain and the graceful logic is wasted.

Timeline of a clean shutdown

t=0s   Pod marked Terminating; preStop sleep starts; readiness -> OUT_OF_SERVICE
t=0s   Kubernetes begins removing the pod from Service endpoints
t=5s   preStop finishes; SIGTERM delivered to the JVM (PID 1)
t=5s   Tomcat stops accepting new connections; in-flight requests continue
t≤35s  All in-flight requests complete; context closes; JVM exits 0
t=45s  (terminationGracePeriodSeconds) hard SIGKILL — never reached if drain finished

Best Practices

Set server.shutdown=graceful in every deployed service.
Tune timeout-per-shutdown-phase to your slowest real request, not an arbitrary value.
Launch the JVM with the Dockerfile exec form so it is PID 1 and gets SIGTERM.
On Kubernetes, add a preStop sleep and set terminationGracePeriodSeconds above the Spring timeout.
Enable availability probes so readiness flips to OUT_OF_SERVICE during shutdown.