At the last couple of startups I had the pleasure to work with, I needed to ensure that the services I build process each request to completion before the application is either restarted or scaled down. In this post I would like to explain how to handle all that with a graceful shutdown procedure and hook into the process shutdown signals.
We'll be following these main threads:
Complex applications or services usually start a number of internal workers that process data more efficiently. If the service is stopped or restarted those workers need to finish handling each in-flight message before they are stopped, in order to avoid data loss or rework.
The same happens with HTTP requests. To make sure that no request is terminated mid-execution, the service should stop receiving new requests, finish handling existing ones, and close connections to dependent services before finally stopping the service. Depending on the programming language you use, this may be easier or harder to do.
In my experience, in order to make more resilient services, you have to think about redesigning the service with shutdown and reconnect procedures in mind.
For example, if you connect to a database you should account for every state change of that connection (i.e. a disconnect or reconnect), as well as define a way to close the connection pool in case you want to shut down the application.
Most libraries already provide this feature, so it should be a simple matter of intercepting the termination signal and implementing a proper shutdown procedure.
Before I can talk about gracefully shutting down a service, you first need to understand how a shutdown procedure is initiated by the system.
Since there is already a standard way of doing this by any operating system, you will find the same flow in most tools that manage processes, namely Systemd, PM2 or Docker. Likewise, if you're using an existing web server solution, then you should check their documentation around shutdown procedures. Now let's look closer into how this flow looks like:
SIGTERM
signal to notify the process that it should begin shutdown procedures. SIGKILL
signal is sent, without the process ever receiving or be able to react to this signal.The same flow is used by the process managers mentioned above. When they need to terminate a running process they will first send SIGTERM
, wait a few seconds (90 by default for Systemd) and then kill the monitored process with SIGKILL
.
If you're not using a process manager and you're simply pressing CTRL+C
in the terminal then the SIGINT
signal is sent instead.
On the other hand, if you're not building your own HTTP server, like in the case of a NodeJS, Go, or any other compiled language, but you are relying instead on the use of Nginx or Apache web servers to serve your app you should know that in order to start a graceful shutdown of those servers you need to send different signals.
For Nginx you should send SIGQUIT
, while for Apache you should send SIGWINCH
in order to trigger a graceful shutdown and stop accepting new requests and wait to finish processing existing ones instead of forcing an instant termination. Check the linked pages for more information regarding the handled signals in these web servers.
If you're using Docker (or any container service) for your deployments, it may be wise to also check out this article which shows how to properly start a process in order to receive these signals.
Next let's look at some examples of how this can be achieved in various programming languages. Since these are two of the programming languages I use the most in my day to day I will start with Go and NodeJS, but you can also find links to some articles of how this can be achieved in other languages.
Since the first thing we need to do to support a graceful shutdown scenario in Go is to intercept the termination signals, let's get that out of the way.
To listen to a signal, I use the os
package to register a channel wherein a message will be sent in case the proper signal is received by the process.
package main
import (
"os"
"os/signal"
"syscall"
)
// Process godoc
type Process struct {
// ... to be defined
}
func NewProcess() *Process {
return &Process{
// to be defined later
}
}
func main() {
// ... initiatize and start service in separate go routines
process = NewProcess()
// then wait for shutdown signal
process.WaitForSignal()
}
func (process *Process) WaitForSignal() {
// create signal notification channel
sigc := make(chan os.Signal, 1)
// listen for termination signals SIGINT or SIGTERM
signal.Notify(sigc, os.Interrupt, syscall.SIGINT)
signal.Notify(sigc, os.Interrupt, syscall.SIGTERM)
// wait until the termination signal is received
sig := <-sigc
// Do some cleanup
// ...
}
Apart from dependencies, a typical Go application may have several goroutines
or trees of goroutines
started that need to be stopped in a certain order.
Shutting down the application requires that we cancel each goroutine
and wait for them to finish cleaning up.
You can define a new type for each worker, as well as define methods to stop the worker and signal when it has finished with the cleanup.
Below is an example of how you can do this:
package main
import (
"sync"
)
// define Worker type
type Worker struct {
// channel on which to signal shutdown
quit chan bool
// wait group to use
WaitGroup *sync.WaitGroup
// define some queue to receive data to process
Queue chan int
}
func NewWorker(waitGroup *sync.WaitGroup) *Worker {
return &Worker{
quit: make(chan bool),
WaitGroup: waitGroup,
Queue: make(chan int), // any data type relevant for your service
}
}
func (worker *Worker) Run() {
// increase number of started goroutines
worker.WaitGroup.Add(1)
// start goroutine
go func() {
// mark goroutine as done when stopped
defer worker.WaitGroup.Done()
// start processing loop
for {
select {
case msg := <-worker.Queue:
// process the message
case <-worker.quit:
// get stop signal
return
}
}
}()
}
func (worker *Worker) Stop() {
worker.quit <- true
}
To signal the worker termination, I used a bool chan that we notify via the Stop method. Another way to do something similar would be to use a context created or passed down when the worker is executed.
To wait until the worker stops I use a WaitGroup to register the goroutine
and unregister it at the end.
The cleanup code might look something like this:
// Process godoc
type Process struct {
// ... some other fields or dependencies (config, db connection, etc)
waitGroup *sync.WaitGroup
worker *Worker
}
func NewProcess() *Process {
// the parent creates the wait group on which all workers register
wg := &sync.WaitGroup{}
return &Process{
waitGroup: wg,
// pass the wait group to each worker
worker: NewWorker(wg),
}
}
func (process *Process) WaitForSignal() {
// ... wait for signal
// signal internal processes to terminate gracefully
process.worker.Stop()
// wait until all of them have terminated
process.waitGroup.Wait()
}
In NodeJS, a similar process can be established to gracefully shut down your service. First, let's consider we have the following service:
const express = require('express');
const http = require('http');
const app = express();
// ... connect to other services, like a DB, Message Broker, etc.
const nc = require('nats').connect();
// ... add some routes to the server
// Start server
const server = http.createServer(app);
server.listen(3000, function () {
console.log('App listening on port 3000!')
});
// @todo add graceful shutdown handler
// @todo bind handler to process events
Let's also assume that the user can make certain requests to the HTTP server, that are then processed and finally sent via NATS to another service via a channel/topic.
For this kind of scenario it is important to ensure that the following are true on exit:
To do this, we can define a function that knows how to shut down the service. This function might look like this:
// ... previous code
// add graceful shutdown handler
function shutdown() {
// shutdown http server first to stop accepting new connections
// and wait for each request to finish
server.close(async () => {
// flush any pending messages to the message broker
await nc.flush();
// close message broker connection
await nc.close();
// exit the process with success
process.exit();
});
}
Note that if you have persistent connections to the service, those might not close simply using server.close
. Instead, you should use a package that handles such cases. See this article for more details.
Now that we have our shutdown handler all that remains is to hook into a process event that is triggered when the proper signal is received. Luckily, the process
object is an EventEmitter
so we can simply use something like this to hook into these signals:
// listen of both signals for a shutdown
process.on('SIGINT', shutdown);
process.on('SIGTERM', shutdown);
As you can see it's not complicated to add a cleanup step in your service shutdown procedure, but it does make a huge impact as your service evolves.
Key Takeaway: always keep the shutdown cleanup code up to date as you add more dependencies or the inner working of your service change in order to ensure no data is lost or corrupted.
We've discussed how to handle a graceful shutdown of a service in Go and NodeJS, but what about other languages?
Well, there are some really good articles that explain how to properly do this in most languages. To make this article complete I'm adding here a few of them for some of the most popular programming languages:
Feel free to send us a message if you have similar links for the programming languages you use daily and we will add them to the list above.
Building and maintaining applications that run in production requires a lot of attention to detail. You want those systems to be as reliable as possible and support horizontal scaling in order to make efficient use of available resources.
In this article I talked about how to hook into the termination signals of a web application and start a graceful shutdown procedure that ensures that any messages or requests are processed to completion before the service exits in order to avoid data loss or corruption due to termination.
If you still have questions, I warmly welcome them in the comments section ☺️. Oh and if you learned something in this article, maybe you want to learn some more by subscribing to our newsletter (check below).