Go, NodeJs, Docker

Graceful Shutdown Tutorial

At the last couple of startups I had the pleasure to work with, I needed to ensure that the services I build process each request to completion before the application is either restarted or scaled down. In this post I would like to explain how to handle all that with a graceful shutdown procedure and hook into the process shutdown signals.

We'll be following these main threads:

  1. How are processes terminated?
  2. Graceful Shutdown in Go
  3. Graceful Shutdown in NodeJs
  4. Graceful Shutdown in Other Languages

Complex applications or services usually start a number of internal workers that process data more efficiently. If the service is stopped or restarted those workers need to finish handling each in-flight message before they are stopped, in order to avoid data loss or rework.

The same happens with HTTP requests. To make sure that no request is terminated mid-execution, the service should stop receiving new requests, finish handling existing ones, and close connections to dependent services before finally stopping the service. Depending on the programming language you use, this may be easier or harder to do.

In my experience, in order to make more resilient services, you have to think about redesigning the service with shutdown and reconnect procedures in mind.

For example, if you connect to a database you should account for every state change of that connection (i.e. a disconnect or reconnect), as well as define a way to close the connection pool in case you want to shut down the application.

Most libraries already provide this feature, so it should be a simple matter of intercepting the termination signal and implementing a proper shutdown procedure.

1. How are processes terminated?

Before I can talk about gracefully shutting down a service, you first need to understand how a shutdown procedure is initiated by the system.

Since there is already a standard way of doing this by any operating system, you will find the same flow in most tools that manage processes, namely Systemd, PM2 or Docker. Likewise, if you're using an existing web server solution, then you should check their documentation around shutdown procedures. Now let's look closer into how this flow looks like:

  1. The termination of any process is triggered by a Signal sent by the kernel to the target process.
  2. The operating system wants to handle process termination as cleanly as possible. In order to achieve this, it will first send a SIGTERM signal to notify the process that it should begin shutdown procedures.
  3. The OS waits for the process to finish terminating on its own for a limited amount of time. If that does not happen when the deadline is reached, then a second signal is sent directly to the kernel to kill the process and clean up any residual resources used. This time, the SIGKILL signal is sent, without the process ever receiving or be able to react to this signal.

The same flow is used by the process managers mentioned above. When they need to terminate a running process they will first send SIGTERM, wait a few seconds (90 by default for Systemd) and then kill the monitored process with SIGKILL.

If you're not using a process manager and you're simply pressing CTRL+C in the terminal then the SIGINT signal is sent instead.

Shutdown Caveats

On the other hand, if you're not building your own HTTP server, like in the case of a NodeJS, Go, or any other compiled language, but you are relying instead on the use of Nginx or Apache web servers to serve your app you should know that in order to start a graceful shutdown of those servers you need to send different signals.

For Nginx you should send SIGQUIT, while for Apache you should send SIGWINCH in order to trigger a graceful shutdown and stop accepting new requests and wait to finish processing existing ones instead of forcing an instant termination. Check the linked pages for more information regarding the handled signals in these web servers.

If you're using Docker (or any container service) for your deployments, it may be wise to also check out this article which shows how to properly start a process in order to receive these signals.

Next let's look at some examples of how this can be achieved in various programming languages. Since these are two of the programming languages I use the most in my day to day I will start with Go and NodeJS, but you can also find links to some articles of how this can be achieved in other languages.

2. Graceful Shutdown in Go

Since the first thing we need to do to support a graceful shutdown scenario in Go is to intercept the termination signals, let's get that out of the way.

To listen to a signal, I use the os package to register a channel wherein a message will be sent in case the proper signal is received by the process.

package main

import (
	"os"
	"os/signal"
	"syscall"
)

// Process godoc
type Process struct {
	// ... to be defined
}

func NewProcess() *Process {
	return &Process{
		// to be defined later
	}
}

func main() {
	// ... initiatize and start service in separate go routines
	process = NewProcess()
	// then wait for shutdown signal
	process.WaitForSignal()
}

func (process *Process) WaitForSignal() {
	// create signal notification channel
	sigc := make(chan os.Signal, 1)
	// listen for termination signals SIGINT or SIGTERM
	signal.Notify(sigc, os.Interrupt, syscall.SIGINT)
	signal.Notify(sigc, os.Interrupt, syscall.SIGTERM)
	// wait until the termination signal is received
	sig := <-sigc
	// Do some cleanup  
	// ...
}

Apart from dependencies, a typical Go application may have several goroutines or trees of goroutines started that need to be stopped in a certain order.

Shutting down the application requires that we cancel each goroutine and wait for them to finish cleaning up.

You can define a new type for each worker, as well as define methods to stop the worker and signal when it has finished with the cleanup.

Below is an example of how you can do this:

package main

import (
	"sync"
)

// define Worker type
type Worker struct {
	// channel on which to signal shutdown
	quit      chan bool
	// wait group to use
	WaitGroup *sync.WaitGroup
	// define some queue to receive data to process
	Queue     chan int
}

func NewWorker(waitGroup *sync.WaitGroup) *Worker {
	return &Worker{
		quit: make(chan bool),
		WaitGroup: waitGroup,
		Queue: make(chan int), // any data type relevant for your service
	}
}

func (worker *Worker) Run() {
	// increase number of started goroutines
	worker.WaitGroup.Add(1)
	// start goroutine
	go func() {
		// mark goroutine as done when stopped
		defer worker.WaitGroup.Done()
		// start processing loop
		for {
			select {
			case msg := <-worker.Queue:
				// process the message
			case <-worker.quit:
				// get stop signal
				return
			}
		}
	}()
}

func (worker *Worker) Stop() {
	worker.quit <- true
}

To signal the worker termination, I used a bool chan that we notify via the Stop method. Another way to do something similar would be to use a context created or passed down when the worker is executed.

To wait until the worker stops I use a WaitGroup to register the goroutine and unregister it at the end.

The cleanup code might look something like this:

// Process godoc
type Process struct {
	// ... some other fields or dependencies (config, db connection, etc)
	waitGroup  *sync.WaitGroup
	worker *Worker
}

func NewProcess() *Process {
	// the parent creates the wait group on which all workers register
	wg := &sync.WaitGroup{}
	return &Process{
		waitGroup: wg,
		// pass the wait group to each worker
		worker: NewWorker(wg),
	}
}

func (process *Process) WaitForSignal() {
	// ... wait for signal
	// signal internal processes to terminate gracefully
	process.worker.Stop()
	// wait until all of them have terminated
	process.waitGroup.Wait()
}

3. Graceful Shutdown in NodeJs

In NodeJS, a similar process can be established to gracefully shut down your service. First, let's consider we have the following service:

const express = require('express');
const http = require('http');

const app = express();
// ... connect to other services, like a DB, Message Broker, etc.
const nc = require('nats').connect();

// ... add some routes to the server

// Start server
const server = http.createServer(app);
server.listen(3000, function () {
  console.log('App listening on port 3000!')
});

// @todo add graceful shutdown handler
// @todo bind handler to process events

Let's also assume that the user can make certain requests to the HTTP server, that are then processed and finally sent via NATS to another service via a channel/topic.

For this kind of scenario it is important to ensure that the following are true on exit:

  • All received requests are processed to completion.
  • Any buffered messages in the message broker are sent over the wire before the service exits.
  • The server stops accepting new requests while it's closing.
  • All connections to dependent services are properly closed in the right order.

To do this, we can define a function that knows how to shut down the service. This function might look like this:

// ... previous code
// add graceful shutdown handler
function shutdown() {
	// shutdown http server first to stop accepting new connections
	// and wait for each request to finish
	server.close(async () => {
		// flush any pending messages to the message broker
		await nc.flush();
		// close message broker connection
		await nc.close();
		// exit the process with success
		process.exit();
	});
}
Note that if you have persistent connections to the service, those might not close simply using server.close. Instead, you should use a package that handles such cases. See this article for more details.

Now that we have our shutdown handler all that remains is to hook into a process event that is triggered when the proper signal is received. Luckily, the process object is an EventEmitter so we can simply use something like this to hook into these signals:

// listen of both signals for a shutdown
process.on('SIGINT', shutdown);
process.on('SIGTERM', shutdown);

As you can see it's not complicated to add a cleanup step in your service shutdown procedure, but it does make a huge impact as your service evolves.

Key Takeaway: always keep the shutdown cleanup code up to date as you add more dependencies or the inner working of your service change in order to ensure no data is lost or corrupted.

4. Graceful Shutdown in Other Languages

We've discussed how to handle a graceful shutdown of a service in Go and NodeJS, but what about other languages?

Well, there are some really good articles that explain how to properly do this in most languages. To make this article complete I'm adding here a few of them for some of the most popular programming languages:

  1. Graceful shutdown in Java: source 1, source 2;
  2. Graceful shutdown in Elixir/Erlang;
  3. Graceful shutdown in Ruby.

Feel free to send us a message if you have similar links for the programming languages you use daily and we will add them to the list above.

Closing Thoughts

Building and maintaining applications that run in production requires a lot of attention to detail. You want those systems to be as reliable as possible and support horizontal scaling in order to make efficient use of available resources.

In this article I talked about how to hook into the termination signals of a web application and start a graceful shutdown procedure that ensures that any messages or requests are processed to completion before the service exits in order to avoid data loss or corruption due to termination.

If you still have questions, I warmly welcome them in the comments section ☺️. Oh and if you learned something in this article, maybe you want to learn some more by subscribing to our newsletter (check below).

Author image

by Cosmin Harangus

CTO Around25
  • Cluj Napoca, Romania

Have an app idea? It’s in good hands with us.

Contact us
Contact us