Split UpdateMergeRequestsWorker using event-driven architecture for improved resilience
What does this MR do and why?
This MR implements an event-driven hybrid approach to split the UpdateMergeRequestsWorker into smaller, more resilient workers. This addresses the issues mentioned in #554081 (closed) where MRs can get stuck in broken states due to worker failures.
Problem
The current UpdateMergeRequestsWorker is a monolithic worker that:
- Can leave MRs in broken states if interrupted by errors
- Causes duplicate work when retried due to Redis connection issues
- Is difficult to debug due to lack of observability
- Can block queues with long-running operations
Solution
Split the worker into an event-driven architecture with:
- Phase 1: Parallel independent operations (git analysis, branch cleanup, LFS linking)
- Phase 2: Sequential heavy operation (diff reload) that depends on git analysis
- Phase 3: Parallel post-processing (notifications, webhooks) that depends on diff reload
Architecture
UpdateMergeRequestsWorker (publishes MergeRequestRefreshStarted)
↓
Phase 1 (Parallel):
├── MergeRequestGitWorker (publishes CommitsAnalyzed)
├── MergeRequestBranchWorker (publishes BranchesProcessed)
└── MergeRequestLfsWorker (publishes LfsLinked)
↓
Phase 2 (Sequential, waits for CommitsAnalyzed):
MergeRequestDiffWorker (publishes DiffsReloaded)
↓
Phase 3 (Parallel, waits for DiffsReloaded):
├── MergeRequestSuggestionWorker
├── MergeRequestNotificationWorker
└── MergeRequestWebHookWorker
Benefits
- Resilience: Each worker is idempotent and can retry independently
- Performance: Independent operations run in parallel
- Observability: Clear event flow makes debugging easier
- Extensibility: Easy to add new processors without changing existing code
- Queue management: Heavy operations don't block lighter ones
Migration Strategy
- Feature flag controlled rollout
- Backward compatibility maintained
- Gradual migration with monitoring
Related to #554081 (closed)
How to set up and validate locally
-
Enable the feature flag:
Feature.enable(:split_update_merge_requests_worker) -
Push to a branch and observe the new worker execution in Sidekiq logs
-
Verify MR refresh functionality works correctly
MR acceptance checklist
This MR addresses the customer-reported issue while providing a foundation for improved worker resilience across GitLab.