Spektor?.dev

Is MongoDB $lookup Really a Left Outer Join?

January 05, 2021

I had a case recently where I ran a $lookup stage in MongoDB aggregation. I had 2 collections: one which contains products (Product) and one which contains availability regions (Region). Each product may be available in certain regions, therefore Product documents have regions field which contains an array of regions the product is available in, for example:

{
    _id: 1,
    name: "Great Product",
    regions: [
        11,
        12
    ]
}

where each element in the regions array is an ObjectID of a Region document. In my aggregation I needed to return regions with their product. This can be done with a $lookup like so:

db.Region.aggregate([
  {
    $lookup: {
      from: "Product",
      let: {
        id: "$_id",
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $in: ["$$id", "$regions"],
            },
          },
        },
        {
          $project: {
            _id: 1,
            name: 1,
          },
        },
      ],
      as: "product",
    },
  },
  {
    $unwind: "$product",
  },
])

This worked but the results contained only the regions which were assigned a product. I visited MongoDB documentation on $lookup stage which clearly states that $lookup:

Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing.

Translating SQL terminology into “MongoDB speak” we get that even regions which don’t have a product assigned should be added in the result. So I decided to investigate the $unwind stage. Interestingly enough, it turns out that $unwind has preserveNullAndEmptyArrays parameter: if preserveNullAndEmptyArrays is false (which is the default value) and the looked up document is null, missing, or an empty array then $unwind does not output a document! This is the very reason which caused me to believe that $lookup was performing a left inner join. The following query sets the preserveNullAndEmptyArrays to true which returns even regions without a product:

db.Region.aggregate([
  {
    $lookup: {
      from: "Product",
      let: {
        id: "$_id",
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $in: ["$$id", "$regions"],
            },
          },
        },
        {
          $project: {
            _id: 1,
            name: 1,
          },
        },
      ],
      as: "product",
    },
  },
  {
    $unwind: {
      path: "$product",
      preserveNullAndEmptyArrays: true,
    },
  },
])

You can play with the query in the playground as well.