Mastering Node.js Cluster Module: The Ultimate Guide to Scaling

Running a production-ready backend application requires consistent availability. While running a single Node.js server can be enough for applications with low traffic, medium- to high-demand services will require scaling. To address this need, Node.js offers a dedicated cluster module.

In this article, we will cover how the cluster module works and cases when it should be used. Additionally, we will see how to implement a zero-downtime strategy to guarantee the 24/7 availability of your server. Throughout all code examples, we will use version 20 of Node.js.

Basic server with cluster module

To kick off we will create a barebone Node.js server with one route.

import http from 'http';

http.createServer(async (req, res) => {
    if (req.url === '/check') {
        // block the event loop for 3 seconds
        // to simulate an overloaded server
        let start = Date.now();
        while (Date.now() - start < 3000) {}

        res.writeHead(200);
        res.end(`Hello from ${process.pid}`);
        return;
    }
    res.writeHead(200);
    res.end();
}).listen(8000, () => console.log('Server started at 8000'));

We deliberately block the event loop to simulate a high load which makes our server unresponsive. If we open http://localhost:8000/check in two separate tabs we will see that the first request is served roughly after 3 seconds and the second one took 3-6 seconds depending on how quickly we opened the second tab (see screenshot below).

Of course, the example is contrived but it could happen if your server is in high demand and there are synchronous operations like hashing or data processing, etc. In this case, we can make use of the cluster module and launch multiple instances of the Node.js server.

import cluster from 'cluster';
import http from 'http';
import { availableParallelism } from 'os';

// how much cpu cores are available
const numCPUs = availableParallelism();

// there is one primary process and n worker processes
if (cluster.isPrimary) {
    // primary process has its own process id
    console.log(`Primary ${process.pid} is running`);

    // children are all forked from the primary process
    for (let i = 0; i < numCPUs; i++) {
        cluster.fork();
    }
} else {
    // and forked worker processes have their own process ids
    console.log(`Worker ${process.pid} started`);
    // Workers can share any TCP connection
    // In this case it is an HTTP server
    http.createServer(async (req, res) => {
        if (req.url === '/check') {
            // block the event loop for 3 seconds
            // to simulate an overloaded server
            let start = Date.now();
            while (Date.now() - start < 3000) {}

            res.writeHead(200);
            res.end(`Hello from ${process.pid}`);
            return;
        }
        res.writeHead(200);
        res.end();
    }).listen(8000);
}

Here with "availableParallelism" method from the os module, we get the number of cores on our machine and in the primary process create worker processes - one per core. Pay attention to the fact that all of them are separate processes with their own memory space and process id. You can see below - my machine has ten cores, so I end up having one primary process and ten worker processes.

Now if we open several tabs one after another we will see that every request is handled after 3 seconds! Isn't it fascinating that with a little tweak, we can scale our application and serve more users?

Interestingly, it looks like every server is listening to exactly one port - 8000, which normally would throw an exception with code EADDRINUSE (Address already in use). So how does Node achieve that? In cluster primary process is responsible not only for the launching of worker processes but also for the distribution of incoming network requests between them. Essentially, it acts like a load balancer.

Everything looks great now, but we should never forget that some unexpected errors may occur even in a well-written and tested app.

import cluster from 'cluster';
import http from 'http';

if (cluster.isPrimary) {
    console.log(`Primary ${process.pid} is running`);

    // fork only 2 worker processes
    for (let i = 0; i < 2; i++) {
        cluster.fork();
    }

    // main process is listening for the exit event of the worker processes
    cluster.on('exit', (worker, code, signal) => {
        console.log(`Worker ${worker.process.pid} died`);
        // if a worker process dies, a new one is forked
        cluster.fork();
    });
} else {
    console.log(`Worker ${process.pid} started`);
    http.createServer(async (req, res) => {
        if (req.url === '/check') {
            // simulate some unexpected error
            if (Math.random() > 0.5) {
                return process.exit(1);
            }

            let start = Date.now();
            while (Date.now() - start < 3000) {}

            res.writeHead(200);
            res.end(`Hello from ${process.pid}`);
            return;
        }
        res.writeHead(200);
        res.end();
    }).listen(8000);
}

In the code above we spawn only two worker processes for simplicity's sake. Our workers will exit randomly but now the primary process will listen to such an event and fork a new worker immediately. In this case, while some requests may fail and cause our server instance to crash it would not impact its further availability.

It looks even better than before, yet another question arises - how to update the running server? We need to restart it, but there could be some unfinished requests. We don't want to cancel these requests and let users experience errors. For that, we need to implement a graceful shutdown. Before implementing it, please install:

npm i http-terminator

import cluster from 'cluster';
import http from 'http';
import { createHttpTerminator } from 'http-terminator';

if (cluster.isPrimary) {
    console.log(`Primary ${process.pid} is running`);

    // fork only 2 worker processes
    for (let i = 0; i < 2; i++) {
        cluster.fork();
    }

    cluster.on('exit', (worker, code, signal) => {
        if (signal) {
            console.log(`Worker ${worker.process.pid} was killed by signal: ${signal}`);
        } else if (code !== 0) {
            console.log(`Worker ${worker.process.pid} exited with error code: ${code}`);
        } else {
            console.log(`Worker ${worker.process.pid} exited successfully!`);
        }
        cluster.fork();
    });

    cluster.on('disconnect', worker => {
        console.log(`Worker ${worker.process.pid} has disconnected`);
    });
} else {
    console.log(`Worker ${process.pid} started`);

    // uncomment this line to see the updated code
    // console.log('Updated code');

    const server = http
        .createServer(async (req, res) => {
            if (req.url === '/check') {
                console.log(`request on ${process.pid}`);
                await new Promise(resolve => setTimeout(resolve, 5000));
                res.writeHead(200);
                res.end(`Hello from ${process.pid}`);
                return;
            }
            res.writeHead(200);
            res.end();
        })
        .listen(8000);

    const httpTerminator = createHttpTerminator({
        server,
        // wait 10 seconds for the
        // server to close all connections
        gracefulTerminationTimeout: 10000,
    });

    // listen for the SIGINT signal and exit gracefully
    process.on('SIGINT', async () => {
        console.log('Starting shutdown procedures...');
        await httpTerminator.terminate();
        process.disconnect();
    });
}

Here we will make use of the http-terminator package which allows us to stop receiving new connections while finalizing already-received requests (or terminate them after 10 seconds). To stop the running process we will send SIGINT signal from the terminal using the following command (please replace <pid> with your pid number):

kill -SIGINT <pid>

So the steps for the next experiment are:

launch the server
open the browser tab http://localhost:8000/check
send SIGINT to the process which received the request
open the second browser tab http://localhost:8000/check

In any case, you will see that the process which we reload will no longer receive any http connection and you will receive the response for the first request. The second response will be also fulfilled (although if you don't have dev tools open with cache disabled in the network tab it may be delayed until the process is shut down). Additionally, you can change the code and the newly reloaded instance will reflect them (try this out).

We've done a great job of bulletproofing our server. We can now sleep well since we know that if something goes wrong its availability will be not affected. Additionally, we can update our server at any time, no matter the current workload. Still, we haven't covered other edge cases on purpose, since now we will jump onto discussing pm2 - process manager for Node.js, which will significantly simplify our workflow.

PM2

The pm2 will abstract a lot of extra code we have written so far, but it was important to understand how it operates under the hood, namely using the cluster module. Additionally, in case your application crashes completely - pm2 will always restart it. So, let's install pm2 globally first:

npm install pm2@latest -g

It is very easy to launch the server using pm2, just execute:

pm2 start app.js

// app.js
import http from 'http';
import { createHttpTerminator } from 'http-terminator';

const server = http
    .createServer(async (req, res) => {
        if (req.url === '/check') {
            console.log(`request on ${process.pid}`);
            await new Promise(resolve => setTimeout(resolve, 5000));
            res.writeHead(200);
            res.end(`Hello from ${process.pid}`);
            return;
        }
        res.writeHead(200);
        res.end();
    })
    .listen(8000);

const httpTerminator = createHttpTerminator({
    server,
    gracefulTerminationTimeout: 10000,
});

console.log(`Worker ${process.pid} started`);

process.on('SIGINT', async () => {
    console.log('Starting shutdown procedures...');
    await httpTerminator.terminate();
    process.disconnect();
});

To see all running processes:

pm2 list

And to check the logs:

pm2 logs app

To delete a previously created process use:

pm2 delete app

By default, pm2 launches the process using 'fork' mode, which means one single instance of the application. If you wish to utilize the cluster module which we discussed above, you should launch it using 'cluster' mode via:

pm2 start app.js -i 2 --kill-timeout 12000

Passing number after the -i flag specifies the number of worker instances (set to max it will launch one process per one core) and for our educational purposes, we will set --kill-timeout to 12 seconds.

With the command provided above you should see the following:

With that, we can conduct our final experiment: open http://localhost:8000/check and execute in the terminal:

pm2 reload app

You will see that our application will be reloaded step-wise one instance at a time. The instance which finishes the request handling will postpone the reload. You can remove the part of the code that handles the graceful shutdown procedure and see that without it, your request in the browser will be terminated.

Conclusion

The cluster module is indispensable for production-grade applications. It provides a flexible set of operations which allow you to fully customize server launch, shut down and request processing. Pm2 abstracts a lot of it in a very simple set of commands and an insightful interface. You should definitely check pm2 official documentation for more details and options.

Hopefully, the examples provided in this post will help you to better grasp the core ideas and techniques of the cluster module, while experimenting and simulating different scenarios. If you have any questions or ideas on that topic - I will be happy to answer them in the comments.

Happy coding 🎉

Node.js Cluster Module Demystified: Scaling Made Easy

Basic server with cluster module

PM2

Conclusion