[go: up one dir, main page]

Skip to content

Lower access change priority of DeleteUserWorker from InactiveTokensDeletionCronWorker cron

What does this MR do and why?

TL;DR

  • This change helps improve the Authorization Team's error budget by reducing load on long-running workers (AuthorizedProjectsWorker).

  • The only functional impact is that user deletions triggered by InactiveTokensDeletionCronWorker may experience a delay of up to one minute. Which should be ok, given that the process already runs via a nightly Cronjob and is not tied to immediate user actions.

  • The expected outcome of merging this MR is we should see significantly less calls to AuthorizedProjectsWorker from DeleteUserWorker (lowering the overall number of calls to AuthorizedProjectsWorker).

The Longer Version

The Authorization Team is currently working to reduce the pressure on AuthorizedProjectsWorker, a long-running background job that frequently exceeds 200–300 seconds in execution time and contributes to Sidekiq saturation, impacting our Appdex and SLOs.

During our investigation, we found that DeleteUserWorker, which runs nightly via Cronjob through InactiveTokensDeletionCronWorker, triggers AuthorizedProjectsWorker repeatedly around midnight—causing a consistent spike in workload that is approximately 3× larger (in average duration × count) than any other observed usage pattern.

In this graph of the sources of AuthorizedProjectsWorker jobs that exceed 10 seconds, Cronjob was consistently spiking: image

Then of Cronjob, the top contributor by far was DeleteUserWorker. Which I was able to narrow down derived from InactiveTokensDeletionCronWorker image

Since this cleanup is already performed via a nightly Cronjob—i.e. not immediately upon user deletion, we believe it's safe to lower the priority of the UserProjectAccessChangedService invocation within DeleteUserWorker.

This effectively changes the urgency of updating user access from "as soon as possible" to a delay of up to 1 minute.

References

Screenshots or screen recordings

No visual difference.

How to set up and validate locally

  1. Setup test data
user = User.first

organization = Organizations::Organization.first || Organizations::Organization.create!(
  name: "QA Org",
  path: "qa-org-#{SecureRandom.hex(4)}"
)

# Main group to be tested
group_path = "qa-test-group-#{SecureRandom.hex(4)}"
group = Group.create!(
  name: group_path,
  path: group_path,
  organization: organization
)

project_path = "qatestproject-#{SecureRandom.hex(4)}"
project = Project.create!(
  name: project_path,
  path: project_path,
  namespace: group,
  creator: user,
  organization: group.organization
)

group.add_owner(user)

# Create another group that owns the shared project
shared_group_path = "shared-src-group-#{SecureRandom.hex(4)}"
shared_group = Group.create!(
  name: shared_group_path,
  path: shared_group_path,
  organization: organization
)

shared_project_path = "shared-project-#{SecureRandom.hex(4)}"
shared_project = Project.create!(
  name: shared_project_path,
  path: shared_project_path,
  namespace: shared_group,
  creator: user,
  organization: organization
)

# Share the shared project with the main group
ProjectGroupLink.create!(
  project: shared_project,
  group: group,
  group_access: Gitlab::Access::MAINTAINER
)
  1. On app/services/groups/destroy_service.rb, add this pry
if user_ids_for_project_authorizations_refresh.present?
    service = UserProjectAccessChangedService.new(user_ids_for_project_authorizations_refresh)

    binding.pry # add this

    if Feature.enabled?(:resource_tokens_lower_user_access_change_priority, group)
        service.execute(priority: user_project_access_change_priority)
    else
        service.execute
    end
end
  1. Run Destroy service
Groups::DestroyService.new(group, user).execute(user_project_access_change_priority: :medium)

Verify that it would not evoke

service.execute(priority: user_project_access_change_priority)
  1. In rails console enable the feature flag

    Feature.enable(:resource_tokens_lower_user_access_change_priority
  2. Run Destroy Service

Groups::DestroyService.new(group, user).execute(user_project_access_change_priority: :medium)

Verify that it would evoke

service.execute(priority: user_project_access_change_priority)

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Matthew MacRae-Bovell

Merge request reports

Loading