Navigating the world of open source databases can be tricky. Even if you've narrowed down your search to a particular type of database, there are still a number of viable options to choose from. In this article, we dive in on document databases, with an overview of how they work, and potential use cases. Afterward, we discuss enterprise-ready open source document databases and how companies can choose the right database for their needs.
Document databases, also called document-oriented databases, are non-relational databases used to store and manage semi-structured data. Though similar to key-value databases in some ways, document databases use internal structure within the document to extract metadata.
Document databases work similarly to key-value databases, in that they store data in tables, with data objects that can sprawl across multiple tables. Because objects are stored as a single instance within a database, these objects are non-relational.
Unsurprisingly, documents are the central component of a document database, and can be encoded in XML, YAML, JSON, and other data languages. Documents don't need to follow a standard schema, and can include metadata to provide additional capabilities for the implementation.
Document databases are a good fit for applications that work with non-uniform and non-relational data. Because Document databases aren't bound by rigid data schema, they can accommodate incoming data that have unique or changing schema.
Common use cases for document databases include content management systems, user profiles, catalogs, and more. Document databases are also a good fit for the extraction of real-time big data.
Document databases are hierarchical and allow developers to follow the document-model approach often used in writing applications. Below, you can see a typical hierarchical organization within a document database.
Get the Full Guide to Open Source DatabasesIn our Decision Maker's Guide to Open Source databases, our experts discuss the top open source databases by niche, including document databases, relational databases, non-relational databases, graph databases, key-value databases, and more. Download the full guide for free by clicking the button below.DOWNLOAD THE GUIDE
In our Decision Maker's Guide to Open Source databases, our experts discuss the top open source databases by niche, including document databases, relational databases, non-relational databases, graph databases, key-value databases, and more. Download the full guide for free by clicking the button below.
DOWNLOAD THE GUIDE
While there are a number of available open source document databases, MongoDB and Couchbase are among the most popular. According to db-engines.com, MongoDB is by far the most popular document database, while Couchbase is rated as the fourth most popular option. When measured against all database types, MongoDB and Couchbase are ranked as the 5th and 29th most popular, respectively.
In the following sections, we take a closer look at these three open source document databases, and how they stack up.
Editors note: The latest release field reflects the latest stable release as of September 2, 2021.
Couchbase Server, originally known as the Membase project, is a NoSQL document database with a focus on performance and scale. It contains three internal database engines, a cache, a key-value store, and a document database, allowing for flexibility in its use case. Couchbase offers a multi-source replication clustering functionality that allows for concurrent writes to multiple nodes in the cluster, making it easier to scale the database. Though the setup process is a little involved, the end result is in an out-of-the-box UI which eases administration and monitoring of the cluster.
Couchbase is an immutable database which is what allows it to scale the way that it does. Instead of bottlenecking writes through a single source that then replicates, multiple cluster nodes can be written to and will replicate in turn. This even allows Couchbase to be upgraded in place without downtime, a major consideration if you plan to support an application with extreme uptime requirements.
Couchbase’s main hinderance, relative to a database like MongoDB, is its reliance on a combination of data indexes including MapReduce makes it somewhat less efficient than more cutting-edge document database indexes. That’s not to say it isn’t powerful, but MongoDB will win on indexing performance, where Couchbase will win on write performance.
MongoDB has seen a meteoric rise in popularity since its release in February of 2009. MongoDB currently sit at number 5 in popularity on the list of database on DB-Engines. This is especially impressive given how new of a database it is, when compared to its peers in the top 5. As a traditional document store, MongoDB is capable of ingesting large, unstructured documents of data in JSON and reliably presenting and preserving those documents.
MongoDB supports a clustering mechanism which allows it to scale and provide fast read capabilities for clients that need to retrieve data.
MongoDB use a flexible document format to store data, making it very useful for cases where you may be storing large data sets whose schema doesn’t match.
To accomplish this with replication and redundancy, while maintaining consistency, MongoDB can only allow one node of its cluster to write to the database at a time. This does create an inherent bottleneck, and, while this can be overcome in part by the ability to shard the database, it will not be as write-performant as other databases like Cassandra which can allow for concurrent writes to the database.
Apache Jackrabbit is an implementation of the Java Content Repository (JCR) standard. This is an object store for Java, which can effectively act as a document database, in that unstructured data in the form of Java objects can be persisted and retrieved from the store natively.
Jackrabbit, and JCR for that matter, is an older Java standard, but the functionality is still very useful. Though not as universal as other document store solutions, Jackrabbit doesn’t require any marshaling or un-marshaling of data to work seamlessly with other Java applications. Jackrabbit also has a full state management and transaction capability for the objects that are stored in it, making it suitable for use in distributed environments.
Finding the best database for your organization starts with determining the needs of your system and business. What type(s) of data do you need to store? Does all incoming data fit neatly into the same fields? Or do you need a database that can handle unique fields by entry? Do you need your data to be highly available? Highly consistent? Having an understanding of these needs is critical.
Ultimately, aligning your system and business needs with the capabilities of the database is what will get you the best fit. Using a database against the grain can be costly, so it's important for organizations to get it right the first time.
Whether you're working with MongoDB and need help troubleshooting an error, need help migrating to a new database, or just want advice on how to find the open source database that's right for you, our team of enterprise architects can help. Learn more about what we offer and how we can support your goals by visiting the link below.