API Gateway authorization and policy caching

A few weeks ago I responded to a question about API Gateway custom authorizers and how it caches the policy with.

With custom authorizers you have two options:

Most of the time you will want to return list of policy statements covering every resource the user needs to access. This will allow the API Gateway to cache the policy. While you can use a wildcard you can also list each resource as its own policy statement.

Change the TTL to 0 so the policy is never cached. This will cause the custom authorizer to be executed for each request. Depending on your authentication mechanism this may allow you to cut someone off immediately.

Where possible I would go with option 1 over option 2.

Yesterday I was asked why option 1 was preferable. It's a very good question and one that I think needs a detailed answer. Beyond the superficial response that setting the TTL to 0 will trigger two Lambda for every API call there is a more fundemental reason for using option 1.

Broad versus Fine-grained Access Controls

Let's suppose you work for a small company and part of your job is approving the monthly expenses for staff. The business is small so you can approve the expenses for anyone. This is broad access controls. You need to be in the right role or group to be able to perform an action but once you're in that group there are no further restrictions.

After a few years the company has grown. You're now one of many team leaders and approving monthly expenses is still part of your job but you're now only allowed to approve expenses for staff in your team. This is fine-grained access controls. Your ability to perform a role also depends on the person you're performing it for.

How does this apply to the API Gateway?

Authorizer are broad level access controls. Continuing the previous example, team leaders might be able to approve an expense if it is for a member of their team and less than \$500.

Being a team leader I'm allowed to view a list of monthly expense claims that I can approve by sending a GET request to /expenses. The API Gateway uses a custom authorizer to implement a broad access controls to stop non-team leaders from making that request. The Lambda that handles the request uses business logic to implement fine-grained access controls so that only expenses I can view/approve are included in the response.

Similarly, when I attempt to approve an expense by sending a PUT request to /expenses/{expenseId} the broad access controls of the API Gateway would allow me as a team leader and reject others. The business logic for deciding if I can approve an individual expense (fine-grained access controls) belongs in the Lambda which has access to the data for that expense.

You could try to put this logic into the custom authorizer and at a small scale it may work. As your application grows beyonds a few Lambdas you will quickly discover that your authorizer has turned into a monolith that needs access to almost every resource (DynamoDB, Elasticsearch, S3, etc) in your application and needs to implement every business rule. It's quickly going to become a very large and complicated function.

To prevent this the best solution is to use custom authorizers for broad access controls and leave fine-grained access controls with the rest of the business logic in your Lambda. This approach also makes it much easier to break your application into microservices later because all of the rules for accessing data will be moved into your microservice with the Lambda. It will also allow you to grow much larger teams because you won't have every developer trying to squeeze their new business rule into one authorizer.