MongoDB Aggregation Tutorial

In this MongoDB tutorial we learn how to group data and perform operations on that grouped data through the aggregation pipeline.

What is Aggregation

Basically, aggregation groups data from multiple documents, allowing us to perform operations on that grouped data and return a combined result.

This would be the same as count(*) with group by in SQL.

Using MongoDB's find() method can in some cases be tedious.

For example, we might want to query embedded documents in a given key. The find() operation will always return the main document and it will fall to us to filter that data and select the key with the embedded documents. Then, we need to scan through them to get the ones that match our search criteria.

There’s no real short way to do this so we’ll have to create a looping operation to go through all the sub-documents until we get all the results. This isn’t a problem if the database is small, but when we have to process millions of documents it will use a lot of server RAM and may even terminate the operation before we could get all the documents we needed.

Aggregation is the technique of grouping data to enhance this querying process.

The aggregation pipeline

In MongoDB, aggregations work as a pipeline.

As data moves through the pipeline, operations are performed on it at various stages until we receive the final combined results.

As an example, think of the process of water purification. Typically, this process goes through five stages.

  1. Chemical Coagulation, where certain chemicals are added to the water.
  2. Flocculation, where the water is stirred to form larger settleable particles for ease of removal by the next stage.
  3. Sedimentation, to remove suspended solids (particles) that are denser (heavier) than water.
  4. Disinfection, to kill or inactivate most microorganisms in the water.
  5. Filtration, to remove particulate impurities that were not removed during the sedimentation process.

Just like in the water purification pipeline, our data will also go through several stages.

  1. Project, to select specific keys from a collection.
  2. Match, to filter the data by a specified key to reduce the amount of documents.
  3. Group, to do the actual combining.
  4. Sort, to sort the documents.

Optionally, the data can go through extra stages.

  • Skip, to skip forward in the list of documents.
  • Limit, to limit the amount of documents to look at.
  • Unwind, to unwind a document that uses arrays.

How to aggregate documents

MongoDB supplies is with the db.collection.aggregate() method to combine documents.

This method takes two arguments.

  • Pipeline is the sequence of data aggregation operations or stages
  • Options to modify the operation (optional)
Syntax:
db.collection.aggregate(pipeline, options)

A basic pipeline uses the $match, $group and $sort meta expressions provided in an array.

Syntax:
db.collection.aggregate([
    {
        $match: { key: value }
    },
    {
        $group: {
            key: value,
            key: value
        }
    },
    {
        $sort: { key: value }
    }
])

As an example, let’s consider that we have some exam results from students and we want to combine.

Example:
db.results.insert([
    { "name": "John", "score": 53, "grade": "F" },
    { "name": "Jane", "score": 87, "grade": "A" },
    { "name": "Jack", "score": 76, "grade": "B" },
    { "name": "Jill", "score": 59, "grade": "F" }
])

To make the pipeline easier to understand, we’ll do it step by step.

Any score above 60 will mean a student has passed, so let’s first match student with 60 or more in the “score” key.

Example:
db.results.aggregate([
    {
        $match: { "score": {$gte : 60} }
    }
])

Next, we’ll choose the primary way of grouping them and store that as the _id . We chose to group them by name.

Example:
db.results.aggregate([
    {
        $match: { "score": {$gte : 60} }
    },
    {
        $group: { _id : "$name" }
    }
])

Notice that we use the $ meta operator in front of the “name” key name to reference it.

Finally, we’ll choose to sort them by “grade” in ascending order.

Example:
db.results.aggregate([
    {
        $match: { "score": {$gte : 60} }
    },
    {
        $group: { _id : "$name" }
    },
    {
        $sort:  { "grade": 1 }
    }
])

The output will be the two students that passed, sorted by the highest grade.

Output:
{ "_id" : "Jane" }
{ "_id" : "Jack" }

Meta expressions for the aggregation method

MongoDB supplies us with several different meta expressions that we can use in the pipeline.

ExpressionDescription
$addToSetInsert values to an array, but don’t allow duplicates in the resulting document
$avgCalculate the average values from all the documents in a collection
$firstReturn the first document from the source document
$lastReturn the last document from the source document
$maxReturn the maximum of all values of documents in a collection
$minReturn the minimum of all values of documents in a collection
$pushInsert values to an array in the resulting document
$sumSummarize the specified values from all the documents in a collection