CAP theorem has been a hallmark of theoretical computer science for over two decades. But, with technologies that allow developers to push the boundaries of this theorem, is it still worth considering?
In this blog, we give an overview of the CAP theorem, discuss how it applies to modern databases, and give our take on whether or not it is still relevant in modern data system design.
In theoretical computer science, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide the following three desirable functions:
This theorem was further solidified by the work of two MIT researchers, Seth Gilbert and Nancy Lynch, when they published their proof of this theorem, called ‘Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services’:
CAP theorem system design will take into account that, even in 2021, no database system can truly offer all three features in their pure form. These databases are described by using any two of the three CAP letters. So, for instance, a “CA” database offers consistency and availability across all nodes but cannot guarantee valid responses in cases where nodes of that database have been horizontally partitioned.
Arguably the easiest way to understand consistency is to think about transactions against the database as single, atomic and non-changeable actions. In a distributed database, this means at its simplest, that any time a user reads a record from that database which was recently written to by any user, that user is reading the record that was most recently written atomic object, and not one written previous to the time at which the query is taking place.
Availability has specific implications as well. In a fully available system, a client connected to the database will always receive a valid request from the database system, regardless of whether some of the nodes of the system are unavailable, for any reason. A valid request means one with consistent, non-stale data, and one free of exceptions or errors.
It's important to remember that we’re talking the “verb” partition, in other words, we are talking about separating a set of data. In this case, it refers to a database engine being tolerant of having a data set split between multiple nodes in a cluster, even if those nodes lose network connectivity to one another. In heavily distributed database systems, it’s often necessary to split data across multiple nodes to achieve better horizontal scale, a process known as “sharding” or “horizontal partitioning.”
Consider a database with two clients and two nodes, where a single set of data is horizontally partitioned or sharded across those two nodes, and where either client can fail over to either node.
In order for those databases to remain consistent, updates made from any client to any node must be made present on both nodes in order for both clients to receive valid requests.
This works fine when both nodes are up, but what happens when one node goes down?
If a client wrote anything to that down node, and that node didn’t replicate it to the other node before going down, then consistency is lost. What if we split the data between the nodes?
In this horizontally partitioned system, we have partition tolerance, and we have consistency – but if we lose availability in any node, we lose consistency.
Here's how CAP theorem and MongoDB relate.
Horizontal scale is especially important for NoSQL database systems, and so the ability to horizontally shard can offer invaluable performance benefits to a NoSQL cluster.
That said, NoSQL databases also tend to be highly distributed with many clients, and aggressive SLA requirements, and so availability and consistency are key.
When exercising system design of a big data platform, a business should fully consider their priorities for the system, as no system will be able to deliver on all three at all times.
MongoDB is a single-write system, meaning that it can offer consistency and partition tolerance, but not availability across all nodes. If a primary node becomes unavailable, a secondary node is elected based on which node received updates most recently.
This centralizes the responsibility of whatever makes updates to the system, which guarantees consistency, and even allows for sharding, since a central database can proxy reads from sharded nodes.
However, not all nodes can guarantee availability, because during times of leader election, the newly elected central node must take time to synchronize from other replicants in the system, and it is necessarily unavailable during that time.
Despite advancements and rethought algorithms such as partially-synchronous models, CAP theorem remains relevant today.
We are, however, in a better spot in terms of dealing with the problem than we used to be.
In this model, a central database node does two important things:
First, it maintains the state of the other database nodes in the system. When a node makes an update or requests data, it lets the central node know about the action first.
Second, it mediates the validity of responses to the other nodes. After sending the update to the central node, if that central node doesn’t respond with an acknowledgement after a certain amount of time, the node that made the update assumes that the message has been lost, and lets the client know that.
This model improves overall consistency in a highly available and partition-tolerant database system, and it will always return a valid response to a client as long as a central node is available. That response, however, may contain no data or stale data, which still violates atomic consistency.
The partially synchronous model will lead to an “eventually-consistent” or Delayed-t consistent system as long as availability is maintained along the nodes, and a system can then decide on the level of staleness that is acceptable in favor of system availability. In a system like this, the data is ultimately replicated to enough available nodes that, once all bits of data are delivered to each node, the system does reach consistency. There will just be gaps of time where the system is inconsistent, and the business must agree on how wide these gaps are allowed to be.
Although there’s no question that advanced thought and better algorithms has made the CAP theorem problem easier to deal with, each of these solutions still sacrifice some or all parts of one CAP component. In many ways, then, as a guiding principle for building databases, the problem has never been more relevant!
The industry is using other software methodologies like storage fabric replication and object storage, deployment automation, and even container orchestration to work around the issues to database system design.
It’s a worthy exercise for any business to consider the state and purpose of their data systems, and validate that the chosen technology fully provides whichever two of the three CAP features are most important to the business application.
OpenLogic can help your team select, plan, integrate, and support the open source databases that make modern systems possible. Talk with an expert today to see how we can help.
Talk to an Expert
Ex-Chief Evangelist - OSS & API Management, Perforce Software
Justin has over 20 years of experience working in various software roles. He is an outspoken free software evangelist, delivering enterprise solutions, technical leadership, and community education on databases, architectures, and integration projects.