Scalability Laws
When selecting an application architecture, the choice is often made easier when the application has high-availability and/or high request-volume requirements. A key feature of the microservice approach is its ability to provide scalability through service replication. To understand why microservices are such a good fit, it is essential to understand the most significant impediment to scaling: Resource Contention.

Resource Contention

Resource contention exists any time two or more things vie for control of a single limited resource. Only one contender can access a resource at a time which forces all others to queue and wait for that contender to finish before they can proceed. As the number of contenders increases, the time a new contender must wait for service is the sum of each previously queued contenders execution time.

As the load on a resource increases, the queue of contenders waiting for that resource grows deeper. If the resource is unable to service the queue with sufficient throughput, the service's queue depth will eventually grow to a point where the service appears unresponsive to new requests. While acceptable response time is application-specific, it easy to see that the service is no longer sufficient for its load when the response time consistently exceeds the acceptable threshold.

Linear Scaling

A common approach to addressing contention is to add additional resources that can share the load, and reduce the time any contender waits in the queue. A real-world example of this can be found in many retail establishments.

To prevent long queues of consumers waiting to purchase products most merchants provide multiple registers and cashiers to service the customers (except Walmart which inexplicably has 30 registers, but only 1 cashier). By adding additional registers and cashiers, a merchant can service multiple customers in parallel thereby reducing the time any individual consumer must wait.

In theory, this would give us linear scaling. For each additional cashier n, we should be able to service n times as many customers at any instance. Unfortunately, this a classic example of the disconnect between our intuition and the real world.

Amdahl's Law

While working on the IBM System/360 in 1967, Gene Amdahl presented a paper at the AFIPS Spring Joint Computer Conference addressing how contention limits parallelization.

In short, the law defines the theoretical maximum expected improvement gained through parallelization. Amdahl's law states that when calculating the benefit to the program (or algorithm) to be parallelized, it is necessary to identify which parts can be parallelized and a which must execute serially. The serial portion becomes the limiting factor.

Amdahl’s formula states that the theoretical improvement for a given fixed workload can be calculated using:

$$S_{latency}= {1 \over (1-p) + p / s } $$ Where:

  • Slatency is the theoretical maximum increase in speed.
  • p is the execution time percentage in the optimized portion.
  • s is the increase in speed in the optimized portion.





As s increases,

$$\lim\limits_{s \to \infty} S_{latency}= {1 \over 1-p} $$

The theoretical maximum speed increase is constrained by the execution time of the unparallelizable portion. This constraing prevents us from achieving linear scalability.

Coherency Delay

In microservice architectures and distributed systems, in general, the process of synchronizing state between service is often accomplished through a message bus. Each time a service updates it's internal state, it is a common practice for the service to broadcast an event containing the relevant data to other interested services by publishing it to the messaging bus.

Other services interested in these events subscribe to the message bus and are notified when these events occur. These events provide a mechanism through which each notified service can update its representation of state with the information payload contained in the event. In this way, a microservice architecture can maintain data consistency across its services.

This approach allows microservice architectures to share data between services without sharing a common database. However, it is important to understand that unlike data in a shared database there is a quantifiable delay before the data is synchronized across all interested services.

The time it takes to achieve synchronization between services is referred to as the Coherency Delay. As the number of services needing update increases, there is a corresponding increase in the coherency delay as the events propagate from producer to consumer.

GUNTHER's Universal Scalability law

Gunther's Universal Scalability Law was defined by Neil Gunther in 1993 and addresses the effects of coherency delay on Amdahl's law. As the services scale up, the coordination cost between nodes eventually exceeds the scaling benefit due to costs imposed by both contention and coherency delay.

The Universal Scalability Law is generalized by the formula:

$$C(N)= {N \over 1 + \alpha(N-1) +\beta N(n-1) } $$ Where:

  • C(N) is the relative capacity of the computational platform.
  • α is the level of contention in the system.
  • β is the coherency delay in the system.

Visualizing the effects of contention and coherency delay

An application that has no contention or coherency delay :

α=0, β=0
This is true linear scaling.

When we introduce contention into the application, we begin to see that system's throughput is negatively impacted by increasing load on the system.

α>0, β=0


As we increase the load on the application, we reach a point where we begin to see diminishing returns due to resource contention.

α>>0, β=0


As we introduce coherency delay into the same system, we see not only diminishing system throughput, we also eventually see a negative impact as the system struggles to reach a coherent state.

α>>0,β>0


Summary

Both Gunther & Amdahl's laws undermine the promise of linear scalability through parallelization. The logical conclusion we must draw is that the linear scalability is in practice unachievable. The only way to achieve linear scalability is to remove all state from a service which is impractical for all but the simplest applications. When designing microservice applications, it is vital to understand the limitations imposed by these laws and look for opportunities to exploit these limitations.

Coming up

Now that we have had an opportunity to review the Laws of Scalability we take a look at one of the most popular approaches used to design microservice architectures: Domain Driven Design.