Split UpdateMergeRequestsWorker for improved resilience
What does this MR do?
This MR implements a split worker architecture for UpdateMergeRequestsWorker
to improve resilience and observability of merge request refresh operations, addressing the issues described in #554081 (closed).
Architecture Overview
The new architecture splits the monolithic MergeRequests::RefreshService
into a chain of specialized workers:
UpdateMergeRequestsWorker (Orchestrator)
├── MergeRequests::PrepareRefreshWorker
├── MergeRequests::ProcessMergedCommitsWorker
├── MergeRequests::ReloadDiffsWorker
└── MergeRequests::FinalizeRefreshWorker
Key Components
Workers
- PrepareRefreshWorker: Initial preparation, finding commits, closing MRs with missing source branches, linking LFS objects
- ProcessMergedCommitsWorker: Detection and processing of manually merged commits, caching merge request closing issues
- ReloadDiffsWorker: Diff reloading, suggestion outdating, auto-merge abortion, todo management
- FinalizeRefreshWorker: Notifications, system notes, webhook execution, pipeline refresh
Services
- RefreshContext: Shared context object for passing data between workers with serialization support
- PrepareRefreshService: Business logic for preparation phase
- ProcessMergedCommitsService: Business logic for merged commit processing
- ReloadDiffsService: Business logic for diff reloading and MR state updates
- FinalizeRefreshService: Business logic for notifications and final processing
Benefits
Resilience
- Idempotency: Each worker can be retried independently without duplicate work
- Failure Isolation: If one step fails, others can still proceed or be retried separately
- Step Tracking: Context tracks completed steps to prevent duplicate execution
Observability
- Granular Metrics: Each worker provides separate performance metrics
- Clear Failure Points: Easier to identify which specific operation is causing issues
- Independent Monitoring: Each worker can be monitored and alerted on separately
Performance
- Queue Distribution: Spread load across different queue priorities
- Reduced Blocking: Shorter individual worker execution times
- Better Resource Management: Different workers can have different resource boundaries
Feature Flag
The implementation is controlled by the split_up_update_mr_worker
feature flag:
- Enabled: Uses the new split worker architecture
-
Disabled: Falls back to the original
RefreshService
flow - Default: Disabled for safe rollout
Rollout Plan
- Phase 1: Deploy behind feature flag (this MR)
- Phase 2: Enable for select projects to validate functionality
- Phase 3: Gradual rollout with monitoring
- Phase 4: Full rollout and remove feature flag
Testing
- Comprehensive unit tests for
RefreshContext
- Worker tests to be added in follow-up MRs
- Integration tests to validate end-to-end functionality
Related Issues
Closes #554081 (closed)
Checklist
-
Feature flag added with appropriate configuration -
Split worker architecture implemented -
Service classes extracted from RefreshService -
Context object for worker chain communication -
Worker queue configuration updated -
Basic tests added for core functionality -
Integration tests (follow-up) -
Performance benchmarking (follow-up) -
Documentation updates (follow-up)