For organizations who handle large amounts of unstructured data, and need high availability and scaling for that data, Apache Cassandra is an increasingly popular open source option.
In this blog, we give an overview of Apache Cassandra — with details on how it works, how it's used, and a rundown of key features and benefits.
Cassandra is a NoSQL database written in Java. It offers high availability and scaling, and is capable of handling high volumes of data and unstructured data types. By not requiring a fixed schema, Cassandra is able to handle things like replication much easier than other databases.
Originally a brainchild of the developers over at Facebook, Apache Cassandra was developed to handle searching of the inbox. It was made open source in 2008 and later became an Apache project in 2009.
At its core, Cassandra is a peer-to-peer system whose design is based on two key products, DynamoDB and Google’s Big Table. Using cluster nodes that all have read/write permissions eliminates the need for any master nodes, as each node is treated as an equal. When thinking of a cluster, it’s easier to envision groups of data centers rather than just individual servers. The beauty of Cassandra is that you can add endless nodes to the cluster and expand your database as you need to.
With the way it handles data, Cassandra can handle structured, semi-structured and unstructured data, providing a great level of flexibility. It’s designed to be used with multiple data centers and as such it makes for easy data distribution. While Cassandra isn’t necessarily your traditional database, it is still ACID (Atomicity, Consistency, Isolation and Durability) compliant.
One of the biggest feature sets of Cassandra is its ability to create an environment without a single point of failure. This decentralized approach makes it a great fit for organizations who have constantly growing or changing data needs, or have data that can’t ever go down.
There are a number of features that make Apache Cassandra an attractive option for enterprises, but its scalability, write speed, fault tolerance, and capacity for performance tuning make it stand out.
Adding nodes to the Cassandra cluster is meant to be easy and available at any given time as your needs grow. Instead of growing vertically, Cassandra is meant to grow horizontally as much as you need it to and across as many geographical sites as needed.
The way that Cassandra handles data allows for it to write to the database quickly. Because data can come in unstructured, you can essentially just chuck your data into the database at ridiculous speeds.
Because all nodes are treated as equals, when one goes down, it’s not a real big deal. You can essentially add enough nodes that you will never go down into a full blown “lights out” scenario.
With Standard Query Language (SQL) you are dealing with relational databases, which are better suited to scaling vertically, deal with table-based data and fixed schemas for moderate volumes of data. Because Cassandra is NoSQL, you can move data horizontally across the clusters easier, have the potential for massive scalability, and is not subject to the confines of joins and fixed schemas.
Cassandra allows for a great deal of performance tuning on top of your typical JVM performance tuning. Another option that is often overlooked is table level compression options that are capable of being configured when creating or changing tables, a feature that is enabled by default.
In this FAQ, we answer common, high-level questions surrounding Apache Cassandra.
Get Answers to Your QuestionsHave questions about how Cassandra can fit into your data stack? Click the button to speak with an expert today.Talk to a Database Expert
Have questions about how Cassandra can fit into your data stack? Click the button to speak with an expert today.
Talk to a Database Expert
Cassandra is not a relational database, as its design does not support the relational data model. To elaborate, a relational model assumes all data is represented as n-ary relations which is a subset of the Cartesian product of n domains. How Cassandra differs is by modeling data as key-value stores, values being represented as rows. Because there’s no enforcement that all rows in a table have the same columns, which is required by the relational model.
Yes, Cassandra is a NoSQL database. Cassandra uses a NoSQL model due to the way it can intake and process massive amounts of data at incredible speeds.
Cassandra is open source and licensed under the Apache License 2.0.
Cassandra is incredibly popular, used by thousands of companies around the world. Right on their website, they post a huge amount of case studies from some of the largest logos in the world at the moment. It's also ranked at #11 on DB-Engines, and is the #1 ranking wide column database.
Cassandra is meant for NoSQL systems that need to store a lot of data and distribute that data as much as possible. Companies who have the need for writing a lot of data quickly and reliably will be successful with it. However, for businesses more interested in rock-solid integrity or preservation of data structure, a document database may be a better choice.
Whether you are implementing or considering Cassandra for your stack, OpenLogic can help. Contact an expert today to see how we can make your open source database journey a success.
Associate Enterprise Architect, OpenLogic by Perforce
Andrew's areas of specialization include networking, Linux, network security including OpenSSL, and operational troubleshooting. He has been working in the industry for over seven years and is acquiring new skills every day.