For organizations that need to operate with data at scale, wide column databases offer an effective option. But not all wide column databases are created equally, and, despite the close similarities between popular options, each have their own benefits and drawbacks.
In this blog, our experts give an overview of wide column databases, including how they work, how they compare to similar database categories, and popular open source options in use today.
A wide column database, or wide column store, is a type of NoSQL database that stores data across flexible columns. These columns can then be stored across multiple servers or database nodes.
Similar to relational databases, wide column databases organize data into columns. But that’s where most of the practical similarities end. Wide column databases like Cassandra and Hadoop (HBase) are built with scale in mind, and they can store distributed, partitioned columns of data.
Columns are also treated differently in wide column databases, with the capacity for multiple column formats and names across rows. Because of how the data is accessed and stored, it also allows for higher compression of data and the facilitation of large volumes of data.
As the name implies, columnar databases are defined by storing data tables by column. These columns can then be stored across servers.
Wide column databases like Cassandra and Hadoop (HBase) are slightly different. They store data much in the same way, but add variable column names and formatting within the same table. Since the individual columns are vertically partitioned, that data can be horizontally scaled across many servers or databases with minimal performance overhead.
Common use cases for wide-column databases favor those that write a large amount of non-structured data, including logging and reporting systems, time series data, and user preference data.
Use cases that require immediate data consistency are not a good fit for wide column databases like Cassandra, as, generally, they are eventually consistent, but not immediately consistent across all places where the data is held.
Read Our Full Guide to Open Source DatabasesWhether you want insight on wide column databases, key-value stores, or classic RDBMS, our Decision Maker's Guide to Open Source Databases is a must-read. Download your free copy below!Download the Guide
Whether you want insight on wide column databases, key-value stores, or classic RDBMS, our Decision Maker's Guide to Open Source Databases is a must-read. Download your free copy below!
Download the Guide
If you're looking for an open source wide column databases, the most popular open source options today are Cassandra, and Hadoop. Specifically for Hadoop, it's the HBase database used within Hadoop. Cassandra and HBase are ranked one and two on the DB-Engines rankings for Wide column stores, with Cassandra as the 11th most popular database, and HBase as the 24th most popular overall.
Cassandra is an open source wide-column NoSQL database originally conceived at Facebook. It focuses on being highly distributed, deploying easily across multiple clouds. This distribution lends itself to optimal efficiency in writes, outpacing many of its NoSQL competitors in its ability to take in large amounts of data all at once.
This distribution also makes it highly available and reliable. Cassandra’s wide distribution makes it an ideal candidate for pairing with streaming data solutions such as Kafka and Spark, as its write-optimized architecture will provide minimal bottlenecks when deployed for those purposes.
Cassandra is meant for NoSQL systems that need to store a lot of data and distribute that data as much as possible. Companies who have the need for writing a lot of data quickly and reliably will be successful with it. However, for businesses more interested in ACID compliance, a document database may be a better choice.
Hadoop was the original big data open source ecosystem and saw tremendous success early on in its inception. Hadoop (and the underlying HBase database) paved the way for numerous well-known and accepted big data concepts including data lakes and distributed ledgers. Though still widespread in its use and adoption, Hadoop’s batch-oriented patterns are not always suitable for predictive analytics which focus on streaming and analyzing large amounts of data at once, in-memory.
Starting new projects with Hadoop is becoming less and less common, but businesses are still building applications and data solutions against existing Hadoop implementations. For more static and batch-driven data solutions, Hadoop is still a solid choice, but be aware that streaming architectures such as Cassandra, Spark, and Kafka claim as much as 100x increased speed when dealing with big data tasks such as MapReduce.
In the right circumstances, a wide column database can enable horizontal scale of your data, and even provide eventual consistency of that data. And, while wide column databases like Cassandra or Hadoop aren't the right fit for all applications, they are well-aligned with a surging need for streaming data and (at least for Cassandra) will likely see increased adoption in years to come.
Get Technical Support for Your Wide Column DatabaseWhether you're considering, implementing, or supporting an open source database, OpenLogic can help you achieve your goals. Ready to learn more about how OpenLogic can help support your team? Talk to an open source database expert today.Talk to an Expert
Whether you're considering, implementing, or supporting an open source database, OpenLogic can help you achieve your goals. Ready to learn more about how OpenLogic can help support your team? Talk to an open source database expert today.
Talk to an Expert
In our latest Open Source Trend Report, our experts weigh in on the top open source database in use today, and analyze the results of a public survey of development professionals.
Download the Trend Report