CAP Theorem in Distributed Database Systems

Almost a couple of decades ago, Eric Brewer introduced the idea that there exists a trade-off among consistency, availability, and partition tolerance. This trade-off is known as the CAP Theorem! CAP theorem is an an important concept that helps architects in designing distributed database systems.

What is CAP Theorem?

The CAP theorem states that distributed databases can have at most two of the three properties in a given time: Consistency, Availability, and Partition Tolerance. In other words, it is impossible for a distributed computer system to simultaneously guarantee all three properties.

  1. Consistency in the CAP Theorem means that all nodes in a distributed database system should have the same data at any given time, regardless of the node they are connected to. When a client performs a read operation, they should receive the most recent write value. So to maintain consistency, the node where the write operation is performed has the responsibility to instantly replicate the data to all other nodes. If there is a failure to maintain consistency, there will be some outdated data in the application.
  2. Availability in the CAP Theorem means that a distributed database system should always be able to respond to client requests i.e. every request should receive a response, without the guarantee that it contains the most recent write. So the system remains operational even in the event of failures. Some techniques to achieve high availability are redundancy, replication, load balancing, etc.
  3. Partition tolerance in the CAP Theorem means that a distributed database system should continue operating even if there is a network partition. So what is network partition? Sometimes due to some disruption in the network (network failures or other factors), nodes get separated into groups so that they can not communicate with each other. This is called network partition.

Network failures can disrupt the system's ability to access and update the data. So network partitions are unavoidable and partition tolerance is a must-have requirements in distributed DBMS systems. That's why most of the time, we need to choose between consistency (all nodes see the same data) or availability (the system remains operational and responsive).

Maintaining consistency can come at the cost of increased latency and reduced availability, because system may need to wait for all nodes to agree on the current state before responding to client requests. On the other side, maintaining availability can lead to data inconsistencies because nodes may return stale data.

CAP Theorem Database Architecture

NoSQL databases are highly distributed and provide horizontal scalability. They can rapidly scale across a growing network of multiple interconnected nodes. Due to this, the idea of the CAP theorem is highly relevant in the case of NoSQL databases!

But as discussed above, one can only have two of the three available functionalities. So based on various choices, here are three different systems based on the CAP theorem.

  • CP System: This system focuses more on consistency and partition tolerance. So these systems are not available most of the time. When any issue occurs in the system, it has to shut down the non-consistent node until the consistency problem is resolved, and during that time, it is not available.
  • AP System: This type of database focuses more on availability and partition tolerance rather than consistency. When any issue occurs in the system, then it will no longer remain in a consistent state. However, all the nodes remain available, and affected nodes might return a stale data, and the system will take some time to become eventually consistent.
  • CA System: This type of database focuses more on consistency and availability across all nodes rather than partition tolerance. Fault-Tolerance is the basic necessity of any distributed system. So it is almost rare to use a CA type of architecture for any practical purpose.

What is CAP Theorem?

Use Cases of CAP Theorem

  • MongoDB focuses on a CP database style. By default, it maintains consistency while compromising on availability: If you read, then you will always get the most recent value of write. There is a reason: By default, MongoDB is a single-leader system, where all reads will go to the leader. If required, MongoDB provides the option to enable read operations from the follower nodes. In that case, MongoDB becomes eventually consistent and it will work as an AP system!
  • Cassandra focuses on an "AP" database style, which concentrates entirely on availability and partition tolerance rather than consistency. In other words, Cassandra will prioritize the ability to always respond to client requests and provide eventual consistency.
  • Microservices-based applications often rely on the CAP theorem to design the most efficient databases. For example, if horizontal scalability is essential with eventual consistency, an AP database like Cassandra can help meet deployment requirements. On the other hand, if the application depends heavily on data consistency like a payment service, it may be better to opt for a relational database like PostgreSQL, which focuses on a CP database style.

Note: Some modern database systems rarely fall completely into any one of these categories because they provide configurations to select replication methods, consistency level, etc. This can have an effect on all three parameters. The truth is: Most of the systems have the goal to achieve all combinations of these three states. So rather than classify a database system as CP or AP, we need to think in terms of the options they provide for tuning these properties, based on our use case.

Conclusion

The CAP theorem provides an understanding of the challenges and trade-offs involved in designing distributed database systems. It guides decision-making, database selection, and system design to achieve the desired balance between consistency, availability, and partition tolerance.

Additional reading: CAP Twelve Year Later: How rules have changed?

Thanks to Suyash for his contribution in creating the first version of this content. If you have any queries or feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy system design!

More from EnjoyAlgorithms

Self-paced Courses and Blogs