Spektor?.dev

Filtering Values In Nested Arrays In MongoDB

December 25, 2020

Usually when you make queries in MongoDB, the returned result is the whole document with all the fields unless you make a projection. However, sometimes you may want to filter some field, specifically an array field by a certain condition. There’re 3 options to achieve this:

  1. $elemMatch
  2. $filter
  3. $lookup with pipeline

$elemMatch

$elemMatch can be used in find operations. Suppose we have a collection named UserReview which contains reviews users made for a certain product and the documents also have a replies field which contains an array of comments made to a review. Each reply also has a status field which tells us whether a reply is approved or not to be displayed on the website. A document looks like so:

{
      _id: 1,
      username: "John",
      content: "I liked the product",
      replies: [
        {
          username: "Andy",
          content: "I liked the product",
          status: "REJECTED"
        },
        {
          username: "Bob",
          content: "cool",
          status: "APPROVED"
        }
      ]
    }

If we want to find user reviews only with approved replies we can run the following query from MongoDB shell:

db.UserReview.find({ "replies.status": "APPROVED" })

which will give us the following result:

[
  {
    "_id": 1,
    "content": "I liked the product",
    "replies": [
      {
        "content": "I liked the product",
        "status": "REJECTED",
        "username": "Andy"
      },
      {
        "content": "cool",
        "status": "APPROVED",
        "username": "Bob"
      }
    ],
    "username": "John"
  }
]

As can be seen the replies field contains both approved and rejected replies because MongoDB returns the document with all the fields by default. If we want to see only approved replies we can use the following projection:

db.UserReview.find(
  {
    "replies.status": "APPROVED",
  },
  {
    content: 1,
    username: 1,
    replies: {
      $elemMatch: {
        status: "APPROVED",
      },
    },
  }
)

This will return the following result:

[
  {
    "_id": 1,
    "content": "I liked the product",
    "replies": [
      {
        "content": "cool",
        "status": "APPROVED",
        "username": "Bob"
      }
    ],
    "username": "John"
  }
]

You can check out the query and play with it in this playground.

$filter

In aggregation pipeline $filter can be used. Given the same document as above the aggregation would be as follows:

db.UserReview.aggregate([
  {
    $match: {
      "replies.status": "APPROVED",
    },
  },
  {
    $project: {
      username: 1,
      content: 1,
      replies: {
        $filter: {
          input: "$replies",
          as: "reply",
          cond: {
            $in: ["$$reply.status", ["APPROVED"]],
          },
        },
      },
    },
  },
])

In $filter operation a temporary internal variable reply is created on which conditional operator cond is invoked. $$reply.status is prefixed with $$ in order to refer to the temporary variable. You can play with the aggregation here.

I personally prefer to use $addFields stage in this case because it allows to overwrite the original replies field. This way other user review fields like content and username do not need to be explicitly specified in projection:

db.UserReview.aggregate([
  {
    $match: {
      "replies.status": "APPROVED",
    },
  },
  {
    $addFields: {
      replies: {
        $filter: {
          input: "$replies",
          as: "reply",
          cond: {
            $in: ["$$reply.status", ["APPROVED"]],
          },
        },
      },
    },
  },
])

$lookup with pipeline

It’s worth mentioning that in a real life scenario user reviews and user replies may be stored in separate collections, therefore a $lookup stage will be required. In such case a special case of lookup can be used which uses pipeline operation.

Suppose the UserReview collections has the following document:

    {
      _id: 1,
      username: "John",
      content: "I liked the product"
    }

while the UserReply has the follwing documents:

    {
      _id: 11,
      username: "Andy",
      content: "I liked the product",
      userReviewId: 1,
      status: "REJECTED"
    },
    {
      _id: 12,
      username: "Bob",
      content: "cool",
      userReviewId: 1,
      status: "APPROVED"
    }

In such case the following aggregation can be performed (playground):

db.UserReview.aggregate([
  {
    $lookup: {
      from: "UserReply",
      let: {
        id: "$_id",
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $eq: ["$$id", "$userReviewId"],
            },
            status: "APPROVED",
          },
        },
      ],
      as: "replies",
    },
  },
])

Because only approved replies are matched within the lookup pipeline the join includes only approved replies in the first place, so no filtering is necessary. Thus the result would be:

[
  {
    "_id": 1,
    "content": "I liked the product",
    "replies": [
      {
        "_id": 12,
        "content": "cool",
        "status": "APPROVED",
        "userReviewId": 1,
        "username": "Bob"
      }
    ],
    "username": "John"
  }
]

However, if the aggregation were:

db.UserReview.aggregate([
  {
    $lookup: {
      from: "UserReply",
      localField: "_id",
      foreignField: "userReviewId",
      as: "replies",
    },
  },
  {
    $match: {
      "replies.status": "APPROVED",
    },
  },
])

a user review would be matched with all the replies where userReviewId equals to the _id of the user review and the replies field contains at least 1 approved reply. In such case $filter would need to be used in projection stage.