How Consistent Hashing Helps with Load Balancing and Servers

A Guide to Consistent Hashing

Mohit Gupta
Level Up Coding

--

Photo by Koen Speelman on Unsplash

As discussed in the previous article, a Load Balancer helps greatly in distributing the load efficiently among servers.

The LB helps in ensuring high availability by routing the traffic always to healthy nodes and hence ensuring a highly available and efficient system for clients.

This works fine for services that do not persist data on server nodes. If the service retains data and returns a response based on that data (or that data itself is the response), then it becomes complex.

Let us understand this with another story.

A Simple Service

We are building a Data service. Use cases are simple. The system should be able to store key and value pairs. Stored data should be durable. The system should be able to return the data for a given key.

For simplicity's sake, we shall keep the design simple, without going into the internal data structure, etc of the system.

There is a service (DataService), that exposes APIs to store and get the data. Service is exposed as rest service.

Service is working well, till a day when traffic goes beyond server capacity. Server crashes that day.

Scaling — Service with Load Balancer

The obvious solutions, as we discussed in the LB basics article, are to host DataService on more than one server and expose it through an LB. It helps the service layer to scale well.

However, the data store where actual key <> value data is stored is still the same, and hence it gradually becomes the bottleneck.

Scaling — Database with Read Replicas

The solution is to make more replicas of the database also.

Having an active-active replica of the whole database is not easy due to complex use cases of data inconsistency and data conflicts. Hence, multiple replicas were created to read the data, while write will always go to the same master node.

Any data updates will be replicated from the master node to other read nodes.

This approach has two issues:

  • Read operations scale well. However, write operations are still bound to one write node only. It is not scalable for write.
  • Read replicas will always be behind the write node. Data is written on the master node and is replicated to the read node later asynchronously. Reading can be inconsistent (stale data).

Scaling — with Sharding

One of the solutions is to use sharding to have multiple nodes which can serve both read and write operations. Whole data will be distributed across the shards in a way each of these nodes contains a subset of data without any collision with other serving nodes.

Each node is a fully functional read-write data store, but for a subset of data.

This will distribute the load on multiple nodes. It will take care of the scaling issues well, as client requests are served by multiple nodes.

However, it also needs a mechanism through which requests for specific data (key) always go to the same server.

This is achieved using hashing, and mod technique. The unique key of data (key from key/value pair in this case) is hashed and a mod is taken for the number of available servers. This algorithm will always give the outcome, referring to the same server (index) based on input key data, and hence will ensure that the request goes to the same server always for reading/writing.

Each node can also have its backup standby nodes or can have its backup on another shard node. This will ensure to have a replica of data, which can be used in case the main node is down.

This looks awesome and solves most of the issues for scaling, redundancy, and availability.

Problem of Rehashing

This approach works fine as long as we have a fixed number of servers. Which is not practical in distributed systems.

True scaling means the capability to scale up and down dynamically based on need. This means that number of servers can be changed at any time.

As soon as, we add or remove a server, it demands to re-hash and re-distribute the data among servers, to keep data distribution balanced.

This also means that with every addition or removal of nodes, a lot of data has to be moved across the server nodes. It will demand a lot of processing time and bandwidth. The system may not be available for that duration, which is bad.

For example, we have keys spreading from hash 1–1000. We have 5 servers, hence the shard. Each shard is mapped to 200 keys.

Now if we add 5 more nodes, it will demand realigning shard and keys mapping. Each shard will have 100 records now. Hence a lot of data needs to be moved across the shards based on rehashed keys.

This could be a big challenge in dynamically scalable systems, which may need to scale up and down frequently.

In real life, this could happen with trillion size of data. Hence this will have huge cost of rehasing and hence of data movement.

Scaling — Servers on a Ring (Consistent Hashing)

So what is the solution?

We need a solution where the request distribution scheme (based on hashing of keys and mod) is not dependent on the number of physical servers. So if we remove or add more servers, the data movement and hence the impact can be minimized.

It means a distributed hashing technique that is not dependent on the number of servers but gives a location that can be later mapped to servers with some logic.

This is where the Consistent Hashing scheme helps. It was first described by Karger et al. at MI in 1997. This is an amazingly simple, but very effective technique.

So far, we were working with an array of servers, which were mapped to hashing + mod of the key for finding the right server. What if we assume a ring of servers, instead of a linear array?

This ring can have multiple locations on it. Some of these locations will have servers mapped to them. Hashing algorithm on the key will give the location on this ring, instead of a direct index for a server.

How it works, when the request comes:

  • Hashing + Mod will give a location on the ring (instead of server index)
  • System checks on the ring, if any server is mapped at the given location.
  • If a server is mapped, the request will be routed to this.
  • If no server is mapped to this location, the system checks on the right of that location on the ring (clockwise). The first server found on the ring, will be used to route the request.
  • If the system hit the end of the ring, the search for the next server starts from the beginning of the ring i.e. from zero index. The request will be routed to the first server coming out of this search.

In the above diagram, we have 600 locations mapped on the ring. 6 servers are mapped to 6 different locations. Any request key hashed to one of these given ranges, will be routed to the corresponding server, as illustrated in the above table.

For example, for a value of 1, the system will not find any server mapped directly to this location on the ring. It will start moving clockwise to find the next server on the ring. The next server is S2. The system will route the request to S2.

What is the benefit

Keys are mapped dynamically with a location on the ring, not with any specific server.

Servers are also mapped dynamically to a location on the ring (maybe by using another hashing technique).

So system just needs to use a mechanism to map both of these dynamically calculated locations, which is being solved by the ring of servers.

Suppose, to manage increased load after some time, we want to add one more server S7 at location 525. The range of 451–525 was mapped to S1 so far. Now, this will be mapped to S7. This needs data in this key range to be moved from S1 to S7.

And that's it. No other data movement is required. None of the other nodes is impacted. Rest the whole system will keep functioning as it is.

The next request for any key in the range of 451–525 will automatically route to server 7, as it is the next available server on the ring.

This saves a lot of overhead.

Data movement is impacting 2 server nodes only (S7, and S1 in the above example), instead of impacting all the servers, and all the data.

It means a much lesser amount of data need to be moved. Hence, it will be done in comparatively much lesser time, and will significantly reduce the impact on clients.

It solves the previous issue of bulk data movement due to dynamic scaling requirements.

However, there is still one issue.

In the above example, suppose servers S6 and S7 are down due to any reason.

All data in the range of 376 to 600 will be moved to Server S1. Server S1 now contains a lot of data, and it has to cater to lot more requests (almost 3 times its previous load).

Although we still have 5 servers left on the ring, however, failure of S6 and S7 will have to be managed by the next server on the ring (S1).

This is a simplistic example. In real life, this data can be in millions or more. This means, we still have a challenge where one node can get overloaded with a lot of requests and data. In this case, this is Server S1.

Scaling — with Virtual Servers on a Ring

The solution given is to create many virtual replicas of existing servers and map these on the ring at random locations (or based on weight).

We still have the same number of physical servers (6 in the first example above). However, we are creating multiple virtual replicas of these 6 servers. Let us assume, we are creating 5 more virtual replicas of each of these servers, and hence now have 30 servers (virtual) in total to map on the ring.

One server will be mapped to cater to a key range of 20 on the ring.

Mapping can still be driven using some hashing technique (or any other optimum algorithm) to keep it random, but efficiently distributed.

For simplicity, the above example assumes equal range distribution of all 30 servers on the ring, and also the mapping to each physical server is also equally weighted, but random. In real life, any custom strategy can be used for both of these mappings, based on various factors like server capacity, etc.

This gives another layer of flexibility to tweak the possible load on a given server, based on its capacity and traffic prediction.

In the above strategy, if one physical server goes down, 6 virtual nodes on the ring will be down. However, these 6 nodes are scattered on the ring at different locations randomly. We may need to move the 120 key range (6*20 for each virtual server in this example). However, these will be at different locations and different physical servers will be at the next place to these virtual servers.

From the above example, if S3 goes down. The load will be distributed among S6, S5, and more. Refer to the table in the above image.

Hence, the load will be distributed to random physical servers. No one physical server will be burdened with all the load.

What this means is, that even if one or two nodes are down, the impacted keys will be distributed across many servers while still keeping the amount of data movement in check.

This solves both the issues i.e. keeping bulk data movement in check and keeping the load on each server node fairly distributed.

Isn't this awesome?

Summing Up

The concept of hashing to distribute the servers and keys mappings on a ring, without being inter-dependent, is called Consistent Hashing.

To sum up the benefits of consistent hashing:

  • It distributes the load on nodes quite efficiently
  • Failure of one node won't affect the data of all other nodes. The impact of rehashing is contained within a few nodes only, with a minimal amount of data movement.
  • If any node is down, its load will be distributed to many other nodes. It helps not to overburden one node.
  • It makes scaling up and down quite efficient, which is the core requirement for distributed systems.

This makes Consistent Hashing one of the most preferred choices for distributed systems, caches, and data stores.

But, how do nodes ensure durability in case any node fails? The system keeps multiple copies of data from each node on other nodes in the ring. It is based on replication factors in most of the systems, which are configurable. This ensures that data is not lost in case of any bad failure.

Data replication has its own complexities, deeper use cases, and interesting design aspects too. We shall discuss it in one of the coming articles.

Till then, stay tuned, Happy Learning…

References

If you enjoyed reading this, please share, give a clap, and follow for similar stories!

For any suggestions, feel free to reach me on Linkedin: Mohit Gupta

--

--

Enjoy building great teams and products. Sharing my experience in software development, personal development, and leadership