How To Solve GraphQL N+1 Problem By Using Custom Dataloaders
May 01, 2021
Many have heard about the famous N+1 problem in GraphQL. It often occurs when queries include relationships between entities, especially child-parent. For example, let’s say we want to make a GraphQL query for a list of purchases and the buyers in some supermarket.
Below are the entities definition in GraphQL SDL:
type Buyer {
id: ID!
name: String!
}
type Purchase {
id: ID!
buyerID: ID!
paymentAmount: Float!
}
type Query {
buyer(id: ID!): Buyer
purchases: [Purchase]!
}
Let’s say we want to make the following query:
{
query {
purchases {
paymentAmount
buyer {
name
}
}
}
}
In a typical implementation of GraphQL relationships (regardless of the framework be it Prisma.io or graphql-compose) the purchases
resolver will not contain the logic which fetches the buyer. Rather a relationship will be defined in the GraphQL library of your choice such that buyer
resolver will automatically be called in order to provide buyer
data to the purchases
resolver.
In order to find the buyer of a given purchase we first need to obtain the buyer id from Purchase
entity because in GraphQL an entity only has access to its parent. Suppose that the data is stored in a MongoDB database. First a query is made for a list of all purchases (let’s say there’re N
purchases). Then for each individual purchase a buyer
resolver is called. Therefore, N+1 queries are made to the database (one for the list of purchases and N for each buyer).
This is of course extremely inefficient because once the list of purchases along with buyer id’s is known just one MongoDB query needs to be performed in order to obtain the array of buyers.
If only there was a way to accumulate the buyer queries and delegate their execution to some logic which would perform the lookups and seamlessly integrate the result into purchases
resolver. Indeed such a way exists and it’s called a dataloader. This project was originally developed at Facebook and is not limited solely to GraphQL but rather to any use case where one needs to group and delay queries execution until some point (it can be applied to REST architecture as well). However, it is especially useful when solving the N+1 problem. In short, instead of executing buyer
resolver per each purchase, the buyer id is loaded int
o the dataloader. Once the purchases list is known, dataloader exposes the list of buyer ids to the callback provided to it. The callback can then make a MongoDB query to fetch the list of all relevant buyers and return them. Dataloader then takes care of merging the buyers with purchases! We’ve now gone from N+1 queries to the database to just 2. 🎉
Below is my implementation of the dataloader for a project which uses ApolloServer and graphql-compose-mongoose.
Firstly, it’s handy to have the dataloader helpers available at ApolloServer context
so that they can always be used with whatever entity without the need of importing the createDataLoaders
in each file (createDataLoaders
returns an object of dataloaders, its implementation is below):
const server = new ApolloServer({
schema: mySchema,
context: async ({ req, res }) => {
return {
dataLoader: createDataLoaders(),
}
},
})
This is the Purchase
model:
import { composeWithMongoose } from "graphql-compose-mongoose"
import { Purchase } from "./mongooseModels"
const PurchaseTC = composeWithMongoose(Purchase)
const purchasesResolver = "purchases"
PurchaseTC.addResolver({
name: purchasesResolver,
type: PurchaseTC,
resolve: async ({ source, args, context, info }) => Purchase.find({}).lean(),
})
// define the relationship between Purchase and Buyer
PurchaseTC.addFields({
buyer: {
type: "Buyer!",
resolve: async (source, args, context, info) => {
return (
// if buyer already exists then no need to look it up
// (it could exist because another Purchase resolver might perform
// an aggregation of purchases and buyers)
source.buyer ||
// source here includes all purchase data, including buyer id
context.dataLoader.purchaseToBuyer.load({ source, args, context, info })
)
},
},
})
schemaComposer.Query.addFields({
[purchasesResolver]: PurchaseTC.getResolver(purchasesResolver),
})
Finally, this is the implementation of the purchaseToBuyer
dataloader:
import DataLoader from "dataloader"
import { Buyer } from "./mongooseModels"
/*
There are a few constraints dataloader batch function must uphold:
1. The Array of values must be the same length as the Array of keys.
2. Each value in the array of values must correspond to the same key in the array of keys.
*/
const getPurchaseToBuyerDataLoader = () =>
new DataLoader(
// purchaseResolverParams is an array of graphql-compose-mongoose resolver params
async purchaseResolverParams => {
const relevantBuyers = await Buyer.find({
id: {
$in: purchaseResolverParams.map(
resolverParams => resolverParams.source.buyerID
),
},
}).lean()
// the map is used to quickly get the buyer based on the buyer id below
const buyersMap = new Map()
relevantBuyers.forEach(buyer => buyersMap.set(buyer.id, buyer))
return new Promise(resolve => {
resolve(
purchaseResolverParams.map(resolverParams => {
const purchase = resolverParams.source
const buyer = buyersMap.get(purchase.buyerID)
return (
buyer ||
new Error(`buyer for purchase id '${purchase.id}' not found`)
)
})
)
})
},
{
// dataloader caches objects with same key
cacheKeyFn: key => key.source.buyerID,
}
)
export const createDataLoaders = () => ({
purchaseToBuyer: getPurchaseToBuyerDataLoader(),
})
It can easily be checked that the above solution performs only 2 MongoDB queries using database logs. First the appropriate MongoDB log level needs to be set in order to observe all query executions: run this db.setLogLevel(2)
on your db in MongoDB shell. I personally like to to use NoSQLBooster for MongoDB because it provides a nice GUI to track MongoDB queries.