MongoDB Data Modelling Tutorial
In this MongoDB tutorial we learn the difference between traditional relational databases and no-sql databases like MongoDB.
We also cover embedded and normalized data models and when to use each.
How does a traditional relational database work
In traditional relational databases like MySQL, we store our data in a table structure with columns and rows.
Each table contains multiple columns that define the data we want to store and each row then contain the entries.
As an example, let’s say we want to store some employee information like name and date of birth.
We would create a table in the database, with the name ‘employee’, that holds three columns called ‘id’, ‘name’ and ‘dob’.
The ‘id’ column is simply a auto-generated unique identifier for each entry in the table.
We can now add some employees to the table, so it would look something like the following.
We can then search or filter users by column based on the data we want.
How is MongoDB different from a relational database
Simply put, we store all related data in a single document.
As an example, let’s consider that we want to create a blog with the following requirements:
- Every post has a unique title, description and URL
- Every post has the name of its author and any likes it got
- Every post may have multiple tags
- Every post can have zero or more comments with the commenter’s name, timestamp, message and likes
In a traditional database, we would use multiple tables linked by an ID.
|01||How to Mongo||Learn how to Mongo||/how-to-mongo/||John Doe||10|
|01||01||Jane||Thanks for the toot||2020-11-09||4|
|02||01||Jack||You forgot DB at the end||2020-11-10||3|
In MongoDB, we put all of the above in a single document that looks something like the following.
_id: 01 title: How to Mongo description: Learn how to Mongo author: John Doe url: /how-to-mongo/ tags: [MongoDB, Database, Learn] likes: 10 comments: user: Jane comment: Thanks for the toot timestamp: 2020-11-09 likes: 4 user: Jack comment: You forgot DB at the end timestamp: 2020-11-10 likes: 3
As you can imagine, there are several benefits to storing data this way, specially when it comes to speed.
Data Models: Embedded vs Normalized
MongoDB allows us to structure our database in two ways.
- We can have (embed) everything in a single document, known as the embedded or de-normalized data model.
- We can create sub-documents and refer to them in the original document by using references, known as the normalized data model.
Let’s go back to our employee example and expand it a little, but this time we use MongoDB.
We’ll add the following details:
- Personal details: First and last name and date of birth
- Contact information: Mobile number and email address
- Adress: Area, City, State/Province
First, let’s use the embedded data model and place everything in a single document, that will look similar to the one below.
_id: 01 personal: first_name: John last_name: Doe dob: 1990-03-14 contact: mobile_no: 123 555 4567 email: firstname.lastname@example.org address: area: Manhattan Beach city: Los Angeles state_prov: California
Now, let’s use the normalized data model and separate the example above into three sub-documents.
_id: 02 emp_id: 01 first_name: John last_name: Doe dob: 1990-03-14
_id: 03 emp_id: 01 mobile_no: 123 555 4567 email: email@example.com
_id: 04 emp_id: 01 area: Manhattan Beach city: Los Angeles state_prov: California
Like the relational database, we link the sub-documents to the main with an ID.
So which one do you use then? This really depends on your requirements.
- If you don’t need to have all the information available at the same time, separate them.
- If you’re going to use the data together, store everything in a single document.
Let’s consider our previous two examples, the blog data and employee information.
The normalized data model would be better suited to the employee information because we won’t always need all the data at the same time. If we wanted to just filter by date of birth to see who’s birthday it is today, we only need the data in the Personal sub-document.
On the other hand, everything in the blog post document is typically needed because of the way a blog post is displayed. In this case the embedded data model is the better choice.