Guide

Intro to Open Source Databases

Databases,

Open Source

For organizations who want to break free of expensive data platforms, or who simply want to leverage the latest innovations in data management, open source databases are a popular option. However, the wide selection of seemingly similar databases can make finding the right one a difficult task.

In our intro to open source databases, we give an overview of available databases, the various open source database types, popular databases within those types, and provide resources that will help you learn more about the databases you need.

What Is an Open Source Database?
Open Source Database List
Relational Databases
Graph Databases
Wide Column Databases
Key Value Databases
Document Databases
Additional Resources

What Is an Open Source Database?

An open source database is a database, or database management system, that is free to download, modify, and re-use.

While the exact details on how open source databases can be used vary by open source license type, the ruling principle is that the underlying code for the database is publicly accessible and modifiable.

A list of popular open source databases can be found below.

Open Source Database List

The list below is borrowed from our recent white paper, the Decision Maker’s Guide to Open Source Databases. While it’s not a comprehensive list of all available open source databases, it does feature the databases we viewed to be viable for enterprise organizations.

		Related	Structured	Read	Write
Relational Databases	PostgreSQL	✓	✓	Moderate	Moderate
	MySQL	✓	✓	Moderate	Moderate
	MariaDB	✓	✓	Moderate	Moderate
Graph Databases	Neo4j	✓	x	Heavy	Moderate
Graph Databases	JanusGraph	✓	x	Heavy	Moderate
Wide Column Databases	Cassandra	x	✓	Heavy	Heavy
Wide Column Databases	Hadoop	x	✓	Heavy	Heavy
Key Value Databases	Redis	✓	✓	Heavy	Heavy
	Elasticsearch	✓	✓	Heavy	Heavy
	etcd	✓	✓	Heavy	Heavy
	Couchbase	✓	✓	Heavy	Heavy
	Prometheus	✓	✓	Heavy	Heavy
Document Databases	CouchDBL	✓	x	Heavy	Moderate
	MongoDB	✓	x	Heavy	Moderate
	Jackrabbit	✓	x	Heavy	Moderate

Relational Databases

Relational databases are based on the relational model of data, and provide a means to store and access related data. Relational databases are typically ACID compliant, meaning they operate with atomicity, consistency, isolation, and durability.

Relational databases are ideal for applications or processes that need a high level data integrity, security, and that use structured data.

Popular open source relational databases / database management systems include:

PostgreSQL
MySQL
MariaDB

PostgreSQL

PostgreSQL is one of the most well-known open source databases, and is often compared in terms of features and functionality with larger, commercial databases such as Oracle and DB2.

Postgres achieves extreme consistency and ACID-compliance through its use of MVCC (Multi-Versioning Concurrency Control) and WAL (Write-Ahead Logging).

MySQL

Perhaps the best-known of all open source RDBMS databases, MySQL forms the (M) in the ubiquitous LAMP stack.

MySQL is a well-rounded database, and while not as capable as far as enterprise concerns as something like Postgres or Oracle, it adapts well to most use cases requiring moderate scale.

MariaDB

Spun from MySQL, MariaDB has since added new features – including a new and improved storage engine called XtraDB.

Much like its progenitor, MariaDB is a popular, well-rounded database. Unlike MySQL, it’s “guaranteed to stay open source.”

Relational Database Resources

Blog – Guide to Open Source Relational Databases
Blog – What Is PostgreSQL?
Blog – Exploring the PostgreSQL System Catalogs
Blog – PostgreSQL Support Options
Blog – EnterpriseDB vs. PostgreSQL
Blog – PostgreSQL vs. MongoDB
Blog – PostgreSQL vs. MySQL
Blog – MySQL vs. SQLite
Blog – What Is an SQL Database?
Blog – Unpacking the PLEASE_READ_ME MySQL Ransomware
Blog – RDBMS vs. NoSQL

Graph Databases

Graph databases are designed to maintain both data, and relationships between data points. In fact, the relationship data is just as important (if not more important) than the data points themselves.

Graph databases are useful when the connections between data points is important. Potential use cases include fraud detection, network operations, access management, and real-time recommendation engines.

Neo4j

Neo4j is a well-known and widely used graph database implementation for Java applications. It exists in both a community and enterprise edition and focuses on performance and ease of use. The community edition comes with an impressive set of features, but for enhanced security, availability, and scale, the enterprise edition is recommended.

JanusGraph

JanusGraph is an open source, community fork of the Titan graph database product from DataStax Enterprise. As with most DataStax-backed products, it focuses on high distribution, throughput, and the ability to handle heavy complexity.

Wide Column Databases

Wide column databases are defined by their ability to use variable column names and formats across rows. This type of database excels at quickly accessing columnar data, and can be sharded to enhance scalability.

Popular open source wide column databases include Cassandra and Hadoop.

Cassandra

Cassandra is an open source wide-column NoSQL database originally conceived at Facebook. It focuses on being highly distributed, deploying easily across multiple clouds.

Cassandra’s wide distribution makes it an ideal candidate for pairing with streaming data solutions such as Kafka and Spark, as its write-optimized architecture will provide minimal bottlenecks when deployed for those purposes.

Hadoop

Hadoop was the original big data open source ecosystem and saw tremendous success early on in its inception. Hadoop paved the way for numerous well-known and accepted big data concepts including data lakes and distributed ledgers.

Though still widespread in its use and adoption, Hadoop’s batch-oriented patterns are not always suitable for predictive analytics which focus on streaming and analyzing large amounts of data at once, in-memory.

Wide Column Database Resources

Blog – Apache Spark vs. Hadoop
Blog – What is Apache HBase?
Blog – A Second Wind for Apache Ambari
Blog – Ambari vs. Cloudera Manager
Blog – Cassandra vs. MongoDB
Blog – Architecting Applications With Apache Cassandra
Solution – Hadoop Service Bundle
Solution – Apache Cassandra Support and Services
White Paper – The New Stack: Cassandra, Kafka, and Spark
On-Demand Webinar – Streaming Data: Why Real-Time Wins the Race

Key Value Databases

Key value databases are a popular type of non-relational database. They are used in instances where horizontal scaling is a necessity. They use a key-value approach that associates a value with a key, which is used in identifying the object.

Redis

Redis was one of the first key caching solutions available as open source and has seen widespread adoption across a range of use cases. One of its most popular use cases was as an enterprise-class session cache, but it has since found applications other data use cases such as fraud analysis and inventory systems.

Elasticsearch

ElasticSearch is a “Search Engine” style of key value database. It takes the capabilities and simplicity that comes with key value stores, but extends the indexing and searching features a little further.

This makes it ideal for searching lots of freeform data, which is why ElasticSearch forms the critical E in the ELK Stack.

etcd

Etcd is the default service registry and backing store application included with Kubernetes, and was designed to be a highly scalable database to hold service endpoints inside a Kubernetes deployment. Etcd’s data model is solidly in the realm of a key value structure, but, its primary access methods are meant to be universal and ubiquitous, and so it allows for cloud-compatible integrations such as JSON/HTTP and gRPC.

Prometheus

Prometheus, the second project to be sponsored by the Cloud Native Computing Foundation (after Kubernetes), has become the de facto standard for gathering metric data from Kubernetes implementations. It’s a high-performance timeseries key value database with a focus on accessibility.

Key Value Database Resources

Blog - Guide to Key-Value Databases
Blog - Exploring Redis Alternatives
On-Demand Webinar – Monitoring Java Applications With Prometheus and Grafana
Blog – How To Visualize Prometheus Data with Grafana
Blog – How to Configure Prometheus AlertManager
Blog – How to Use Prometheus Monitoring With Java to Gather Data

Document Databases

Document databases, or document-oriented databases, are non-relational databases that are used to store and manage semi-structured data (aka document-oriented).

Though similar to key value databases, document databases use internal structure within the document to extract metadata.

Couchbase

Couchbase Server, originally known as the Membase project, is a NoSQL document database with a focus on performance and scale. It contains three internal database engines, a cache, a key value store, and a document database, allowing for flexibility in its use case.

MongoDB

MongoDB has seen meteoric popularity since its release in February of 2009. MongoDB currently sits at number 5 in popularity on the list of database on DB-Engines.

As a traditional document store, MongoDB is capable of ingesting large, unstructured documents of data in JSON and reliably presenting and preserving those documents.

JackRabbit

Apache Jackrabbit is an implementation of the Java Content Repository (JCR) standard. This is an object store for Java, which can effectively act as a document database, in that unstructured data in the form of Java objects can be persisted and retrieved from the store natively.

Document Database Resources

Blog – What Is MongoDB?
Blog – Big Data on Demand With MongoDB

Additional Resources

Looking for additional resources on open source databases? Be sure to review the links below.

Blog - Comparing the Top Open Source Databases of 2024
Blog - InfluxDB and Telegraf Overview
White Paper – Decision Maker’s Guide to Open Source Databases
Trend Report - 2021 Open Source Database Trend Report
On-Demand Webinar – Real-Time Data Lakes: Kafka Streaming With Spark

Intro to Open Source Databases

Table of Contents

What Is an Open Source Database?

Open Source Database List

PostgreSQL

MySQL

MariaDB

Neo4j

JanusGraph

Cassandra

Hadoop

Redis

Elasticsearch

etcd

Couchbase

Prometheus

CouchDBL

MongoDB

Jackrabbit

Relational Databases

PostgreSQL

MySQL

MariaDB

Relational Database Resources

Graph Databases

Neo4j

JanusGraph

Wide Column Databases

Cassandra

Hadoop

Wide Column Database Resources

Key Value Databases

Redis

Elasticsearch

etcd

Prometheus

Key Value Database Resources

Document Databases

Couchbase

MongoDB

JackRabbit

Document Database Resources

Additional Resources