Scaling - Clustering in node.js


Recently I ran into a scaling problem with my web server. It significantly slowed down under load. If I could make it use all the CPU cores, the performance would increase significantly.

TL; DR

Source code can be found on GitHub, in the jsCluster repo.


Scaling

Clustering in node.js

Javascript and node.js is single threaded by default. That means applications are running on a single CPU core, and can't take advantage of the modern multicore machine it's hosted on.

Possible solutions

There are multiple ways to fix this.

  1. Run multiple instances of our server, listening on different ports and route traffic to those from a reverse proxy (like Nginx).
    Pros:
    • Full fledged web server.
    • Best performance.
    • Advanced load balancing.
    Cons:
    • Quite difficult to set it up.
    • Needs some devops skills.
    • We have to configure it differently on machines with different number of CPU cores.
    • Increased file descriptor and memory usage.
  2. Use the built-in cluster module of node.js to fork worker processes, all listening on the same port.
    Pros:
    • Very easy to set up.
    • Do everything from code, no configuration needed.
    • Dynamically scale according to the CPU cores available.
    • Easy communication between the processes.
    Cons:
    • A little less performance.
    • We might have to take care of some advanced features ourselves, like managing the lifecycle of our processes.

Since I didn't have much time for this, and I'm not particularly fluent in devops tasks, I chose the cluster module. You can read a more detailed comparison, including a third option (iptables) here.

Our basic server

Here is the code we are starting from. It's a dead simple Express web server, with a single hello world endpoint.

let express = require('express'),  
    app = express(),
    port = 3000;

app.get('/', function (req, res) {  
    res.send('Hello World!');
});

app.listen(port);  
console.log(`application is listening on port ${port}...`);  

The cluster module

Node.js has a built-in cluster module (docs) for quite a long time. It had some problems in it's past, but the kinks got ironed out, and became the recommended way of scaling for multicore machines. It's surprisingly simple to use, we just need a couple of lines for the basic setup.

let cluster = require('cluster');  
if (cluster.isMaster) {  
    let cpus = require('os').cpus().length;
    for (let i = 0; i < cpus; i += 1) {
        cluster.fork();
    }    
} else {
    let express = require('express'),
        app = express(),
        port = 3000;
    app.get('/', function (req, res) {
        res.send('Hello World!');
    });
    app.listen(port);
    console.log(`worker ${cluster.worker.id} is listening on port ${port}...`);
}

We just have to separate the code for a master, and a worker part. The first process will start up as the master, and it forks as many worker processes as the number of CPU cores we have. The cpus value we get means logical cores (so our 4 core i7 with hyperthreading = 8). The worker code is basically the same as the previous example.

With this little work, we gave our app the capability to use all cores. For hobby or small scale projects, this may be already good enough. But we need to take care of some other things, to make it more reliable and robust, if we want to use this in larger scale applications.

  1. If a worker process dies, we need to fork a new one. If we don't do this, a long running application might loose all it's workers over time.

  2. If the workers don't die, we should kill them occasionally. I know, this may sound odd. But if we don't, even a small memory leak will use up more and more memory over time, and eventually we will run out. So I kill the process somewhere between 10K and 20K requests. (In real life, we would do this much more rarely.)

let cluster = require('cluster');  
if (cluster.isMaster) {  
    let cpus = require('os').cpus().length;
    for (let i = 0; i < cpus; i += 1) {
        cluster.fork();
    }
    cluster.on('exit', function (worker) {
        console.log(`worker ${worker.id} exited, respawning...`);
        cluster.fork();
    });    
} else {
    let express = require('express'),
        app = express(),
        port = 3000,
        counter = 10000 + Math.round(Math.random() * 10000);
    app.get('/', function (req, res) {
        res.send('Hello World!');
        if (counter-- === 0) {
            cluster.worker.disconnect();
        }        
    });
    app.listen(port);
    console.log(`worker ${cluster.worker.id} is listening on port ${port}...`);
}

Our server is now capable of running indefinitely, but it's still not very robust. As you can probably imagine, the more precisely we want to manage the processes, and the more edge cases we want to handle, the more complex our code will get. For example: if the newly forked processes keep dying for some reason, it could lead to a kind of infinite loop. The standard solution to this is to wait more and more time before re-forking a worker. There are many edge cases like this. If we want to handle all of them, we will probably go insane before we succeed.

Process Managers

Process managers to the rescue. Specifically I tried PM2, but StrongLoop has similar features. This is our third option.

  1. Using a process manager.
    Pros:
    • You don't have to modify your code. At all.
    • Manages process lifecycle.
    • Much more robust than some naively coded clustering with the cluster module.
    • Gives you advanced features like process restarting, monitoring, and zero downtime reloading.
    Cons:
    • Your code doesn't really have control over your worker processes, the management is automatic.
    • Inter process communication can only be done with a different module.
    • A tiny bit slower than the cluster module.

Honestly, this solution was an afterthought on my part. If I had found PM2 before I wrote this post, I probably hadn't chose the cluster module to start with. But no regrets, the journey is often more important than the destination.

PM2 is an npm package. Install it globally, so you can use it anywhere:

npm install pm2@latest -g  

Then, simply start our basic, non-clustered server with it, and scale to the number of cores:

pm2 start serverSingle.js -i max  

Now our processes are running, and keep running indefinitely. We can even monitor the load on the processes visually:

pm2 monit  

pm2 monit

PM2 can do a lot more stuff, I won't go into much detail here, you can read more about it's features here. I recommend using it instead of trying to come up with your own solution, except when you really know what you are doing.

Benchmarking

For comparing the performance of the different solutions, I used ApacheBench (ab). It's a nice tool for simple stress testing, and it's already available on macOS.

I had some problems, namely running out of ephemeral ports, which I fixed by using the KeepAlive option (-k). I don't want to go into much detail about this, but you can read more here if you are interested.

The command used to run the test.
(Meaning: Do a million requests, maximum a hundred concurrent, and keep the connection alive.)

ab -n 1000000 -c 100 -k http://localhost:3000/  

Results:
(Measured on a 2015 MacBook Pro, 4 core, 2.2 GHz Intel Core i7, 16 GB RAM.)

Metric Single Clustered PM2
Time taken for tests (s) 89 27 28
Requests per second 11239 37357 35403
Transfer rate (KB/s) 2316 7698 7295
Longest request (ms) 164 70 61

As you can see:

  • The cluster module is the fastest in most metrics.
  • PM2 is just a tiny bit slower, which is very good considering the robustness and the advanced features we get compared to the cluster module.
  • The single process is not even in the ballpark. No surprise here.

Conclusion

If you need to scale, you have to do clustering. Use a process manager if you can, it saves you a lot of trouble. If you can't, or you feel that's overkill for your use case, use the cluster module. Even in the basic setup, it's far better than nothing.


If you enjoyed reading this blog post, please share it or like the Facebook page of the website to get regular updates. You can also follow us @dealwithjs.

Happy clustering!