CAP Venn Diagram
Historically when we wanted to increase our ability to service more users or store more data, the first choice was Vertical Scaling- also known as "get a bigger server." Unfortunately there comes a time when that either becomes financially impractical or there is no longer a path to a more powerful machine. When that happens, the only option is to scale horizontally. As we begin the process of horizontally re-engineering the application, we must contend with how to manage three traits: Consistency, Availability, and Partition Tolerance.

  • Consistency

    Consistency is the ability for the system to read from any replicated service instance, and always return the most recent write made across replicated services of the same type.

  • Availability

    Availability is the ability for a service to execute a request and receive a response within a reasonable time interval without timing-out or throwing an error. The definition of reasonable is of course application specific.


  • Partition Tolerance

    A network partition occurs when one or more of the application's services become unreachable. Partition Tolerance is the application's ability to continue to function in the event a network partition occurs.


CAP Theorem

CAP Theorem was first suggested by Eric Brewer in 1998 and described the relationship between Consistency, Availability and Partition Tolerance in distributed systems. The theorem is predicated on the fact that within distributed systems, network partitions are a fact of life and must be factored into the application's design.

CAP theorem states that when a network partition ocurrs, a choice must be made between Consistency or Availability. In the absence of a network partition, both Consistency and Availability can be satisfied.


Dealing with Network Partitions

Once we accept that network partition will occur in a distributed application, we must choose which aspect of the application behavior we are willing to sacrifice and to what degree:

  1. Sacrifice Consistency - (maintain Availability and Partition tolerance). In this scenario we allow writes on both sides of the partition. When the partition has been resolved, both sides of the partition need to merge their data to return to a consistent state.
  2. Sacrifice Availability - (maintain Consistency and Partition tolerance) In this scenario, after a partition has occurred, one side of the partition is disabled to prevent inconsistency. The remaining partition remains consistent, and the application continues with a lower degree of availability.

Strong Consistency

Traditional Databases provide consistency through ACID transaction semantics. Every statement within a transaction must complete, or the transaction fails, and the executed statements are rolled back.

Transaction Management is required to maintain consistency during the execution of the transaction (and during any rollback that may occur). A Transaction Manager is used to manage contention between writes to the participating tables or rows. During the transaction window, the Transaction Manager locks the participating tables or rows to prevent access by other clients. As we have seen in previous articles, these locks create contention on the resources (tables or rows) and limit their availability during the transaction. By locking the data, the transaction manager prevents other actors from making changes to the tables or rows participating in the transaction. Here the Transaction Manager is prioritizing consistency over availability. We may choose to use row locks to minimize the contention since row locks have a smaller scope than table locks, but we are still affecting availability.

The overhead incurred by the Transaction Manager is necessary to achieve high consistency and is usually a manageable compromise in a single node system. However, to achieve this level of consistency in a distributed system requires transaction management spanning all participating nodes for a given system. A distributed transaction manager would need to distribute all state changes and achieve consensus across all participating nodes before the transaction could be considered complete. The same effort would also be required when performing rollback operations.

As all of these operations are executing across a network of multiple nodes the communication overhead and failure management requirements make this an impractical solution. In distributed systems, strong consistency is often relaxed, and we embrace an Eventually Consistent model.

Eventual Consistency

Eventual Consistency is a compromise. In this compromise, we accept a weak guarantee that eventually all the participating nodes will converge on the same value. With this compromise, we choose higher availability of service over strong consistency. Instead of ACID semantics, we get BASE ( Basically Available, Soft state, and Eventual consistency) semantics.

Summary

CAP theorem forces us to consider the ramifications of network partitions on a microservice architecture. When developing an individual microservice, we must carefully consider the service's requirements and find an appropriate balance between consistency and availability. It is important to remember that this optimization is decided at the individual-service level. An application may have services optimized for consistency as well as others optimized for availability. The optimization should always be grounded in the application's requirements.

Coming up

Now that we have a basic understanding of CAP theorem, we are able to look at several strategies that help us achieve our optimization goals. In the next article, we can now discuss a scalability strategy that is intended to help reduce contention and improve availability called Sharding.