Sharding

Consistency and Scalability

We know that consistency creates contention and contention has a negative impact on application scalability. To increase an application's scalability, we must minimize resource contention and reduce the volume of updates between nodes. How do we balance an application's consistency requirements with its need to scale? A common strategy often used within reactive systems to accomplish this is a technique called sharding Sharding.

Sharding

Sharding is a technique that provides strong consistency while lowering contention on entities by distributing them across multiple nodes called Shards. Each persistent entity will exist in only one shard, and that shard will exist in only one location. We determine the shard location of any entity by using an entity's shard key. The shard key can be a synthetic key, or it can be composed of one or more entity attributes. A function is used to map the entity's shard key to a shard and must ensure that it will only ever map a given entity to a single shard.

Sharding creates a consistency boundary around the entity by isolating it and confining it to a single shard. By distributing the entities across multiple shards, we are able to decrease the opportunity for contention as a function of the number of shards.

Sharding Overview

Unlike database sharding which shards data within a database, this approach to entity management is implemented at the application level and requires a service to route messages to the correct shard location. This routing service is responsible for calculating the entity's shard key and translates that key into a shard location. Once the shard location has been determined the router forwards the entity's messages to the correct shard.

Sharding Controller

In general, entities are sharded at the Aggregate Root level to reduce the total number of shards required by the system.

Shard Balancing

The primary goal of sharding is to minimize contention. To accomplish this goal, it is critical that the application's entities are distributed in a balanced fashion across the shards. The selection of an appropriate shard key is essential to balancing the entities across the shards. Unbalanced sharding will increase traffic to specific shards creating hot-spots. These hot-spots negatively impact performance by increasing contention in the higher-activity shards. To avoid creating hot-spots, it is important to select a shard key that generates an even, randomized distribution of keys across all shards. Otherwise, the higher contention on the dominant shards lowers their availability and undermines the sharding process.

Contention within the Sharding system

It is important to remember that while allowing us to isolate contention, sharding doesn't remove it entirely. Instead, sharding attempts to spread any contention across the collection of shards. In addition to the individual shards, a common bottleneck for contention can often be the routing service itself. The shard routing service must have sufficient throughput to route the sharding system's messages. Otherwise, it becomes the weakest link in the chain. Fortunately, shard routing services are generally stateless and can be safely replicated to minimize contention. As is the case with most replicated services, this comes with the costs associated with service discovery. In addition to the shard and the shard router service, the application can also still encounter contention at the entity level itself, if that entity is exceptionally active.

It is important to remember that the goal of a sharding system is to isolate contention to individual entities and to allow for scaling by distributing shards across multiple nodes.

Availability within the Sharding system

Sharding distributes possible contention across multiple shards in a distributed system. Sharding is a consistency (CP) solution, and we know from CAP theorem that as such it must sacrifice availability when a network partition occurs. What should we do when a shard becomes unavailable? One solution is to provide a caching layer above the shard. By adding caching to the shard routing service, we can increase both performance and availability. Since most systems are read-heavy, we check the cache first and only go to the shard on a cache miss. This can signifcantly increase performance. If a shard becomes temporarily unavailable its most active entities should still be in the cache improving overall entity availability.

Summary

Sharding spreads our entities across multiple shards to reduce contention on the entity access mechanism. Sharded system performance is heavily influenced by the selection of the shard key, so it is critical to select an appropriate shard key to avoid excessive activity in any one shard. By avoiding entity replication we are able to achieve high data consistency and still provide high availability with a sharded strategy.

Coming up

What if we want availability over consistency? Sharding can strike a good balance between consistency and availability. However, what can we do if we need to prioritize availability over consistency? In the next article, we will introduce an availability-centric strategy called Conflict-Free Replicated Data Types.