Cluster Daemon - Creating an Automated Failover System [part 1]

in #utopian-io7 years ago (edited)





(Source)

Introducing clustd, a new project for failover systems. The goal is to provide a fully automated cluster service across multiple systems in case a system goes down for either maintenance or an unexpected event.

Background

When @steemdunk had an hour of downtime, I made this project as high priority to ensure downtime will be a thing of the past. This will be used to ensure the service stays alive with minimal impact in case a server goes down.

This has been a work in progress alongside some other projects for quite a while and it was time to publish the first part to this project. It has been rewritten a few times to get the design I wished to see.

Inner workings

Connections are made through web sockets. This allows for simplifying connection management internally and avoiding doing the initiation handshake for every client, which can get expensive on a server.

The system will maintain a persistent connection to each of its remote machines. There is active health checking for every machine. Each machine is responsible for health checking its peers. Pings are made every 1.5 seconds, allowing for a maximum of 3 seconds of downtime in a worst case scenario.

Running as a server and a client, it can handle inbound and outbound connections for full connection duplexing. For efficiency reasons and simplicity reasons, each machine will keep one connection open to each other rather than two.

When a master gets disconnected another machine will automatically become the master based on a naive consensus algorithm. Incorrect configurations can cause an error when determining the next master for the cluster.

Security

This project does not make use of WSS/HTTPS, the approach is different but similar! The clear reason for this is that it is still subject to an attack if you turn off certificate checking. In most cases, since the connections are made directly through an IP instead of a domain, certificate checking will always fail.

The encryption protocol scheme is similar to that of SSL with some differences. First there is a handshake that occurs where the server assigns a random number to the client, called a "ticket", which gets concatenated with the secret key. Secondly, there is a message counter that also gets concatenated with the secret key.

This provides full replay protection and key rotation on top of the random IV to ensure complete security for each message. Attempting to replay any messages will result in a decryption error and the machine's connection will be deemed unreliable and closed.

The algorithm being used is AES-128-GCM. This takes care of the message MAC to ensure the message wasn't tampered with. A completely random IV is generated for each message on top of the additional key rotation protection.

Getting setup

Building

It is recommended to use Node v9. Knowledge of node and npm are recommended but not required.

  1. Clone the repository: https://github.com/steemdunk/clustd.git
  2. Run npm install
  3. Run npx gulp build

Configuring

The configuration is noted in the README of the project:
https://github.com/steemdunk/clustd#configuration

It is possible to run the cluster on the same machine using different ports for local testing. This has no use in a production environment, however. ;)

There are some additional configuration variables to be mentioned:

  • export DEBUG='clustd:*' To enable debugging the entire system
  • export CLUSTD_CONFIG=./my-config.yml To specify a configuration. By default the configuration path is set to ./config.yml, this allows changing the path if necessary.

Running

Once everything is configured appropriately. Starting it up is easy: node ./out/index.js.

Sample screenshot of a 3 machine cluster, with the 3rd machine down and the 2nd machine is the master. Full debug is enabled and the activity is clearly visible.

Roadmap

Drivers are next for implementation in the next part. They will be what controls a system (i.e. starting and stopping a service) when a server becomes the master or secondary.

While the cluster itself is ready, it's not fully ready to be useful yet. This project is still in the alpha stages and ongoing improvements will be made as progress continues.

Checkout the project



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

heh... upvoted for the URL of the 1st image ;)

way to go, sounds like an ambitious and interesting project!

This is great for the community!

Awesome work Sam, another great contribution to the whole that I will be greatly appreciated by others. Amazing work Brother couldn't be more proud

Good work Sir!!!

You are amazing! Awesome job!

Thank you for the contribution. It has been approved.

You can contact us on Discord.

[utopian-moderator]

Awesome work!

really very nice technology post ,thanks

Hey @samrg472 I am @utopian-io. I have just upvoted you!

Achievements

  • You have less than 500 followers. Just gave you a gift to help you succeed!
  • Seems like you contribute quite often. AMAZING!

Community-Driven Witness!

I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!

mooncryption-utopian-witness-gif

Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x

You are the man!

Absolute genius, great job Sir

Amazing work once again Sam!

Very cool thanks @samrg472

Hi @samrg472,

It seems you got a $30.2439 upvote from @wackou at the last minute before the payout. (22.37h) and this comment is to make everyone aware.

Please follow @abusereports for additional reports of potential reward pool abuse. Thank you.