Split UpdateMergeRequestsWorker for improved resilience (!200681) · Merge requests · GitLab.org / GitLab

What does this MR do?

This MR implements a split worker architecture for UpdateMergeRequestsWorker to improve resilience and observability of merge request refresh operations, addressing the issues described in #554081 (closed).

Architecture Overview

The new architecture splits the monolithic MergeRequests::RefreshService into a chain of specialized workers:

UpdateMergeRequestsWorker (Orchestrator)
├── MergeRequests::PrepareRefreshWorker
├── MergeRequests::ProcessMergedCommitsWorker  
├── MergeRequests::ReloadDiffsWorker
└── MergeRequests::FinalizeRefreshWorker

Key Components

Workers

PrepareRefreshWorker: Initial preparation, finding commits, closing MRs with missing source branches, linking LFS objects
ProcessMergedCommitsWorker: Detection and processing of manually merged commits, caching merge request closing issues
ReloadDiffsWorker: Diff reloading, suggestion outdating, auto-merge abortion, todo management
FinalizeRefreshWorker: Notifications, system notes, webhook execution, pipeline refresh

Services

RefreshContext: Shared context object for passing data between workers with serialization support
PrepareRefreshService: Business logic for preparation phase
ProcessMergedCommitsService: Business logic for merged commit processing
ReloadDiffsService: Business logic for diff reloading and MR state updates
FinalizeRefreshService: Business logic for notifications and final processing

Benefits

Resilience

Idempotency: Each worker can be retried independently without duplicate work
Failure Isolation: If one step fails, others can still proceed or be retried separately
Step Tracking: Context tracks completed steps to prevent duplicate execution

Observability

Granular Metrics: Each worker provides separate performance metrics
Clear Failure Points: Easier to identify which specific operation is causing issues
Independent Monitoring: Each worker can be monitored and alerted on separately

Performance

Queue Distribution: Spread load across different queue priorities
Reduced Blocking: Shorter individual worker execution times
Better Resource Management: Different workers can have different resource boundaries

Feature Flag

The implementation is controlled by the split_up_update_mr_worker feature flag:

Enabled: Uses the new split worker architecture
Disabled: Falls back to the original RefreshService flow
Default: Disabled for safe rollout

Rollout Plan

Phase 1: Deploy behind feature flag (this MR)
Phase 2: Enable for select projects to validate functionality
Phase 3: Gradual rollout with monitoring
Phase 4: Full rollout and remove feature flag

Testing

Comprehensive unit tests for RefreshContext
Worker tests to be added in follow-up MRs
Integration tests to validate end-to-end functionality

Related Issues

Closes #554081 (closed)

Checklist

Feature flag added with appropriate configuration
Split worker architecture implemented
Service classes extracted from RefreshService
Context object for worker chain communication
Worker queue configuration updated
Basic tests added for core functionality
Integration tests (follow-up)
Performance benchmarking (follow-up)
Documentation updates (follow-up)

Split UpdateMergeRequestsWorker for improved resilience