Improve Protected Environments efficiency/performance as a preparation for group-level feature
Problem
While we were working on the Group-level protected environments feature, we realized that the current project-level protected environments had performance/architectural issues:
- There is a N+1 problem when authorizing a user to access to an protected environment. This is a long standing issue. e.g. https://gitlab.com/gitlab-org/gitlab/-/issues/321788
- We're lacking of cache mechanizm to reduce the number of authorization processes. Currently, we execute database queries in every single authorization.
- We don't have SSOT facility to maintain the authorization logic thus it's hard to be extended for group-level feature.
- This issue is also affected by Direct relationship between Pipeline Job and Environment.
The N+1 problems are currently mitigated by a temporary caching, however, since it's tightly related to project context, we can't easily extend it to group-level.
We tried to find a way to build the group-level feature without addressing these issues, however, it'd be quite challenging to pass the performance reviews with the current shape. Especially, group-level feature requires more complex-and-expensive database query execution, meaning there is a risk to be an SLA issue and required to rework in the future.
Proposal
With keeping in mind that Group-level protected environments feature implementation comes at next iteration, we should improve the following things:
- Introduce SSOT facility for the authorization to make the auth logic container agnostic.
- Refactor the project-specific caching for the SSOT facility.
- Introduce a new caching logic as described below.
Given that there are N jobs, the following process happens per authorization.
- Can the user executes the job?
- Will the job interact with environment?
- Does the user have an access to the environment?
- Fetch protected environments row. (Cached with RequestStore)
- Fetch deploy access levels. (Cached with RequestStore)
- Evaluate (sometimes executes a query to
namespacesorcheck_access)
The point is that we should cache "Does the user have an access to the environment?" result
in RequestStore. This works for both project and group levels.