Photo by Jeremy Thomas on Unsplash.com
When selecting an application architecture, the choice is often made easier when the application has high-availability and/or high request-volume requirements. A key feature of the microservice approach is its ability to provide scalability through service replication. To understand why microservices are such a good fit, it is essential to understand the most significant impediment to scaling:
Resource Contention.
As the load on a resource increases, the queue of contenders waiting for that resource grows deeper. If the resource is unable to service the queue with sufficient throughput, the service's queue depth will eventually grow to a point where the service appears unresponsive to new requests. While acceptable response time is application-specific, it easy to see that the service is no longer sufficient for its load when the response time consistently exceeds the acceptable threshold.
To prevent long queues of consumers waiting to purchase products most merchants provide multiple registers and cashiers to service the customers (except Walmart which inexplicably has 30 registers, but only 1 cashier). By adding additional registers and cashiers, a merchant can service multiple customers in parallel thereby reducing the time any individual consumer must wait.
In theory, this would give us linear scaling. For each additional cashier n, we should be able to service n times as many customers at any instance. Unfortunately, this a classic example of the disconnect between our intuition and the real world.
In short, the law defines the theoretical maximum expected improvement gained through parallelization. Amdahl's law states that when calculating the benefit to the program (or algorithm) to be parallelized, it is necessary to identify which parts can be parallelized and a which must execute serially. The serial portion becomes the limiting factor.
Amdahl’s formula states that the theoretical improvement for a given fixed workload can be calculated using:
Resource Contention
Resource contention exists any time two or more things vie for control of a single limited resource. Only one contender can access a resource at a time which forces all others to queue and wait for that contender to finish before they can proceed. As the number of contenders increases, the time a new contender must wait for service is the sum of each previously queued contenders execution time.As the load on a resource increases, the queue of contenders waiting for that resource grows deeper. If the resource is unable to service the queue with sufficient throughput, the service's queue depth will eventually grow to a point where the service appears unresponsive to new requests. While acceptable response time is application-specific, it easy to see that the service is no longer sufficient for its load when the response time consistently exceeds the acceptable threshold.
Linear Scaling
A common approach to addressing contention is to add additional resources that can share the load, and reduce the time any contender waits in the queue. A real-world example of this can be found in many retail establishments.To prevent long queues of consumers waiting to purchase products most merchants provide multiple registers and cashiers to service the customers (except Walmart which inexplicably has 30 registers, but only 1 cashier). By adding additional registers and cashiers, a merchant can service multiple customers in parallel thereby reducing the time any individual consumer must wait.
In theory, this would give us linear scaling. For each additional cashier n, we should be able to service n times as many customers at any instance. Unfortunately, this a classic example of the disconnect between our intuition and the real world.
Amdahl's Law
While working on the IBM System/360 in 1967, Gene Amdahl presented a paper at the AFIPS Spring Joint Computer Conference addressing how contention limits parallelization.In short, the law defines the theoretical maximum expected improvement gained through parallelization. Amdahl's law states that when calculating the benefit to the program (or algorithm) to be parallelized, it is necessary to identify which parts can be parallelized and a which must execute serially. The serial portion becomes the limiting factor.
Amdahl’s formula states that the theoretical improvement for a given fixed workload can be calculated using:
$$S_{latency}= {1 \over (1-p) + p / s } $$ Where:
- Slatency is the theoretical maximum increase in speed.
- p is the execution time percentage in the optimized portion.
- s is the increase in speed in the optimized portion.
Twitter
Facebook
Reddit
LinkedIn
Email