If you're not familiar with the GraphQL N+1 problem then consider the following query.
query {
authors {
id
name
books {
id
name
}
}
}
The "authors" resolver is executed once and returns a list of N authors. The "books" resolver is then executed once for each author. This is the N+1 problem because you execute N resolvers for the books plus 1 for the authors.
As your query becomes deeper the number of resolvers executed increases significantly. Let's assume there are 10 authors, each with 5 books. To fetch the authors and books there are 11 resolvers executed but if we also request publisher information for each book we now execute 61 resolvers (1 + 10 + 10 x 5).
query {
authors {
id
name
books {
id
name
publisher {
id
name
}
}
}
}
When configuring AppSync to use a Lambda function as a resolver you can use the Invoke
or BatchInvoke
operation.
The Invoke
operation will cause AppSync to invoke your Lambda function every time it needs to execute the resolver. For the "authors" resolver in our example that is acceptable because it is only executed once.
import { AppSyncResolverHandler } from 'aws-lambda';
export const authorsHandler: AppSyncResolverHandler<unknown, { id: string; name: string }[]> = async (event) => {
const authors = [
{ id: '1', name: 'Tajeddigt Olufemi' },
{ id: '2', name: 'Lelio Miodrag' },
{ id: '3', name: 'Aineias Vladimir' },
{ id: '4', name: 'Sachin Lamya' },
{ id: '5', name: 'Kamakshi Cosme' },
{ id: '6', name: 'James Sharmila' },
{ id: '7', name: 'Holden Wulfflæd' },
{ id: '8', name: 'Quidel Bahdan' },
{ id: '9', name: 'Reinout Johanna' },
{ id: '10', name: 'Til Nikica' },
{ id: '11', name: 'Metoděj Maxima' },
{ id: '12', name: 'Đình Ester' },
{ id: '13', name: 'Màxim Kristina' },
{ id: '14', name: 'Yedidia Jafar' },
{ id: '15', name: 'Lone Mariusz' },
{ id: '16', name: 'Vitya Franjo' },
{ id: '17', name: 'Malvolio Lochlann' },
{ id: '18', name: 'Evette Dierk' },
{ id: '19', name: 'Nnenna Basileios' },
{ id: '20', name: 'Dmitrei Iya' },
];
return authors;
};
For the "books" resolver we really want to use the BatchInvoke
operation. To use it with the direct Lambda resolver integration add the MaxBatchSize
property to your resolver definition.
BooksResolver:
Type: AWS::AppSync::Resolver
Properties:
ApiId: !GetAtt Api.ApiId
DataSourceName: !GetAtt BooksDataSource.Name
TypeName: Author
FieldName: books
MaxBatchSize: 1000
AppSync will now invoke your Lambda with an array of up to MatchBatchSize
events. This allows your Lambda to process multiple resolvers at once instead of individually.
import { AppSyncBatchResolverHandler } from 'aws-lambda';
export const lambdaHandler: AppSyncBatchResolverHandler<
unknown,
{ id: string; name: string }[],
{ id: string; name: string }
> = async (events) => {
return events.map((event) => {
const books: { id: string; name: string }[] = [];
for (let i = 1; i <= 20; i++) {
books.push({ id: `${event.source.id}-${i}`, name: `Book ${i}` });
}
return books;
});
};
By batch handling resolver executions in a single Lambda invocation you can improve performance by reducing the number of cold starts and lower costs through invoking less Lambda functions.
Tip: In production you may want to experiment with values forMaxBatchSize
as you are trading off the risk of a cold start and cost versus being able to process requests in parallel.
What about our publishers?
You should always use BatchInvoke
for nested Lambda resolvers when possible. At each level of nesting AppSync will determine the events that need to be sent to your Lambda resolver then pass them in array of MaxBatchSize
length resulting in the minimum number of Lambda functions being executed. With a sufficiently high enough MaxBatchSize
value you may only need to invoke one Lambda function for each level of nesting in your query.
By using BatchInvoke
you may also be able to optimize your resolvers through more efficient processing of requests.
In our example we are requesting information about the publisher for each book. It's likely that publishers will be repeated many times. If we used Invoke
then each resolver execution would use GetItem
to fetch the published record from DynamoDB. This means the same publisher record would be retrieved multiple times. With BatchInvoke
we can filter a list of unique publisher ID's, use BatchGetItem
to fetch them all at once from DynamoDB then map over the events to return the correct publish record for each resolver. This reduces our usage of DynamoDB and makes it quicker.