[go: up one dir, main page]

Skip to content

Improve user experience with CI variables when GraphQL requests time out

Problem to Solve

Improve UX of prefilled variables for CI Pipelines when GraphQL times out due to PGBouncer saturation.

When users navigate to Build → Pipelines → Run Pipeline, GitLab attempts to prefill CI/CD variables defined in .gitlab-ci.yml. This feature uses a GraphQL query that relies on ReactiveCache (a Sidekiq background job) to fetch and cache these variables.

The issue: When Sidekiq is saturated or experiencing high load, the cache population job gets queued but never executes. In this scenario, the GraphQL query returns an empty response (null) instead of an error. This creates a silent failure where users cannot distinguish between:

  • "Your CI config has no variables to prefill" (legitimate empty state)
  • "We failed to fetch your variables due to infrastructure issues" (error state)

This issue was discovered during INC-446 when PGBouncer saturation caused GraphQL timeouts, leaving users confused about why their expected variables weren't appearing.

Current Behavior

  1. User navigates to the Run Pipeline page
  2. Frontend queries ciConfigVariables via GraphQL
  3. Backend checks ReactiveCache:
    • If cache exists → returns variables
    • If cache is expired/missing → queues Sidekiq job and returns null
  4. Frontend receives null and displays an empty variables form
  5. User has no indication whether variables are loading, failed to load, or genuinely don't exist

Expected Behavior

When the cache is not ready (being populated by Sidekiq):

  1. Backend service (app/services/ci/list_config_variables_service.rb) should raise an error instead of returning null
  2. GraphQL resolver (app/graphql/types/project_type.rb) should propagate this error to the frontend
  3. Frontend (app/assets/javascripts/ci/pipeline_new/components/pipeline_variables_form.vue) should display an error message like: "Unable to load CI variables. The system may be experiencing high load. Please try again."
  4. Users understand they need to retry rather than assuming no variables exist

Original description

Investigation

It looks like the graphQL is calling https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/graphql/types/project_type.rb#L937-945 which is ListConfigVariablesService The service itself uses reactiveCaching: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/ci/list_config_variables_service.rb#L5 ReactiveCache is a sidekiq job by itself. And will trigger each time a time period ends. By default this lifetime is 10.minutes: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/reactive_caching.rb#L32 So when sidekiq saturated or not responding, this won't calculate any variables

Proposals to Consider

  1. Can we show an error to the user when this occurs?

I don't think the graphQL was failing. It was just returning an empty response; which is still valid. Because the sidekiq job was queued up, we didn't know if there were variables not showing, or if there was just no variables at all.

Can we distinguish variables being empty or the GraphQL endpoint failing to return any values?

  1. Should we be supporting "experimental" features/flags? e.g.

One thing I noticed just now is that this whole thing is still "experiment: milestone 15.3" - which probably should be moved out of if customers are using and expected this already.

This ticket was created from INC-446 and was automatically exported by incident.io 🔥

Edited by 🤖 GitLab Bot 🤖