Improve user experience with CI variables when GraphQL requests time out
Problem to Solve
Improve UX of prefilled variables for CI Pipelines when GraphQL times out due to PGBouncer saturation.
When users navigate to Build → Pipelines → Run Pipeline, GitLab attempts to prefill CI/CD variables defined in .gitlab-ci.yml
. This feature uses a GraphQL query that relies on ReactiveCache (a Sidekiq background job) to fetch and cache these variables.
The issue: When Sidekiq is saturated or experiencing high load, the cache population job gets queued but never executes. In this scenario, the GraphQL query returns an empty response (null
) instead of an error. This creates a silent failure where users cannot distinguish between:
- "Your CI config has no variables to prefill" (legitimate empty state)
- "We failed to fetch your variables due to infrastructure issues" (error state)
This issue was discovered during INC-446 when PGBouncer saturation caused GraphQL timeouts, leaving users confused about why their expected variables weren't appearing.
Current Behavior
- User navigates to the Run Pipeline page
- Frontend queries
ciConfigVariables
via GraphQL - Backend checks ReactiveCache:
- If cache exists → returns variables
- If cache is expired/missing → queues Sidekiq job and returns
null
- Frontend receives
null
and displays an empty variables form - User has no indication whether variables are loading, failed to load, or genuinely don't exist
Expected Behavior
When the cache is not ready (being populated by Sidekiq):
- Backend service (
app/services/ci/list_config_variables_service.rb
) should raise an error instead of returningnull
- GraphQL resolver (
app/graphql/types/project_type.rb
) should propagate this error to the frontend - Frontend (
app/assets/javascripts/ci/pipeline_new/components/pipeline_variables_form.vue
) should display an error message like: "Unable to load CI variables. The system may be experiencing high load. Please try again." - Users understand they need to retry rather than assuming no variables exist
Original description
Investigation
It looks like the graphQL is calling https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/graphql/types/project_type.rb#L937-945 which is ListConfigVariablesService The service itself uses reactiveCaching: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/ci/list_config_variables_service.rb#L5 ReactiveCache is a sidekiq job by itself. And will trigger each time a time period ends. By default this lifetime is 10.minutes: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/reactive_caching.rb#L32 So when sidekiq saturated or not responding, this won't calculate any variables
Proposals to Consider
- Can we show an error to the user when this occurs?
I don't think the graphQL was failing. It was just returning an empty response; which is still valid. Because the sidekiq job was queued up, we didn't know if there were variables not showing, or if there was just no variables at all.
Can we distinguish variables being empty or the GraphQL endpoint failing to return any values?
- Should we be supporting "experimental" features/flags? e.g.
One thing I noticed just now is that this whole thing is still "experiment: milestone 15.3" - which probably should be moved out of if customers are using and expected this already.
This ticket was created from INC-446 and was automatically exported by incident.io