With so many open source databases to choose from, finding the right database can be a challenge. That’s especially true for non-relational databases like MongoDB, where there are many viable options to choose from.
In this blog, we give an overview of MongoDB, answer frequently asked questions about the database, and discuss important components of MongoDB, including documents, storage engines, and sharding.
MongoDB is a popular NoSQL document-oriented database developed by MongoDB Incorporated.
The word “Mongo” was derived by the database’s ability to store humongous amounts of JSON data. Documents can have any schema which is unlike a relational database management system (RDBMS). This means data in related tables can be joined into a single document in affect de-normalizing the data.
There are two important terms with regards to MongoDB. Documents are synonymous with records in an RDBMS. Collections are a grouping of documents, and the equivalent structure in an RDBMS for a collection in MongoDB is a table.
Find the Right Open Source DatabaseWant to find the right open source database for your needs? Our Decision Maker's Guide to Open Source Databases is a must-read for any organization worried about data performance, security, and stability.Download the Guide
Want to find the right open source database for your needs? Our Decision Maker's Guide to Open Source Databases is a must-read for any organization worried about data performance, security, and stability.
Download the Guide
MongoDB has three different storage engines: In-Memory, WiredTiger, and the Encrypted Storage Engine.
The In-Memory storage engine is exactly as it sounds, and only uses very little on-disk data (metadata, diagnostic data). Data with this engine is meant for applications where performance is paramount, and data can be ephemeral. Once the machine loses power then the data is lost unless you are running a replica set. Do not use this storage engine if you require persistent data.
WiredTiger is the default storage engine for MongoDB. It uses document level concurrency, which means that clients can modify different documents within a collection simultaneously. WiredTiger is suitable for most workloads.
The Encrypted Storage Engine is an enhanced version of WiredTiger Storage Engine that supports Encryption at Rest. It is only available in MongoDB Enterprise.
You have several different options when deploying MongoDB: a standalone single instance, a replica set, and a sharded cluster.
If you want redundancy for your database, then you will need to create a replica set. The recommended machines in a replica set are three, so you have three copies of your data. The nodes in a replica set will vote to determine who is the primary. The primary is the only node in the replica set that can accept writes. The other two nodes can accept reads (after a configuration change).
The typical amount of data that your application queries is called the working set. We usually try to strive to have the replica set fit into memory. If it does not fit, then you will experience swapping of data to and from the disk. This will slow your application down considerably.
What if you cannot fit your working set entirely into memory due to cost or other limitations? You will want to scale your cluster horizontally by using sharding.
MongoDB sharding is a horizontal partition of data in a database. This makes it possible to divide your data between nodes based on a key, so that not all the data falls into one replica set. This allows you to fit your working set into memory.
For example, assume we have data being imported into a database from across the country. We have two data centers geographically placed, DCEast and DCWest, that each house a replica set. We also place these data centers close to the region the data will be queried in (with one node being a member of the other replica set for redundancy).
It would not make sense to place East data in the DCWest data center and vice versa. The data contains a location key that identifies where it came from. We can designate location keys 1 to 100 go to data center DCEast, and keys 101 to 200 go to data center DCWest. By distributing data in this fashion, we have a better chance to fit our working set into memory and avoid swapping.
MongoDB supports two types of sharding: range-based sharding (as explained above) and hashed sharding. What is hashed sharding? Hashed sharding uses a hash of a field as the shard key to partition the data across the cluster. Hashed sharding will usually give you good data distribution because of high cardinality.
Know your data! We have many customers who come to us with severely unbalanced sharded data. Choosing a shard key with good distribution is imperative when you model your data. In some situations, sharding by a geographic key would not work for your data. Picking a shard key should be a one-time operation and changing the shard key can be very difficult especially with large databases. Choose a shard key with good cardinality with many different values.
In the sections below, we answer some of the most commonly asked questions about MongoDB.
MongoDB is open source, and is licensed via the Server Side Public License for all versions released after October 16, 2018.
Versions prior to that date were released under GNU AGPL v3.0.
MongoDB is one of the most popular NoSQL databases.
Yes, MongoDB supports horizontal scaling via sharding and replication sets.
For companies looking for a mature, scalable, and open source document database, MongoDB is an attractive option. It’s easy to scale, and can enable fast access by avoiding I/O on ephemeral data.
That said, MongoDB can have sharp edges for companies working with large amounts of data – especially when it comes to sharding.
Get Guidance and Support for Your Open Source DatabasesNeed support for MongoDB or another open source database? OpenLogic provides SLA-backed database support directly from Enterprise Architects. Talk to an expert today to learn how OpenLogic can help support your integrated (and planned) open source.TALK TO AN EXPERT
Need support for MongoDB or another open source database? OpenLogic provides SLA-backed database support directly from Enterprise Architects. Talk to an expert today to learn how OpenLogic can help support your integrated (and planned) open source.
TALK TO AN EXPERT
Enterprise Architect, OpenLogic by Perforce
Bill has over 25 years of experience working in various software roles related to full stack development including user interface, middleware, databases (RDBMS and NoSQL), security, DevOps, training, and mentorship. His primary focus is applying open source in the enterprise.