Fast Forward Merge Trains MRs can be marked 'merged' while commits are missing from the target
Summary
Race condition in Fast Forward Merge Trains causes MRs to be marked as "merged" while their commits are missing from the target branch.
Steps to reproduce
- Enable Fast Forward merge trains on a project
- Create a first MR and add it to the merge train
- Create a second MR and add it to the merge train
- Immediately after, create a third MR and add it to the merge train (timing is important)
- Important detail from zendesk: > A third MR was added, though instead of running the
merge train
pipeline for its commit, it seems to have ran a pipeline for the commit of the second MR.
- Important detail from zendesk: > A third MR was added, though instead of running the
- Wait for the merge train to process all MRs
- Observe that all MRs are marked as "Merged" by GitLab
- Check the target branch git history - the commit from the third MR is missing despite showing as merged in the UI
Example Project
This is occurring for a customer in https://gitlab.zendesk.com/agent/tickets/619183 (internal use only). As stated by the customer:
So we have this version of Gitlab deployed from last couple months and afaik this is the first time we hit this issue. That's why we think it as a race condition, reason being we have 1000+ developers working on a monorepo. So you can imagine the number of mr's and commits happening at a time.
What is the current bug behavior?
When multiple MRs are added to a Fast Forward merge train in rapid succession, the third MR is marked as "merged" by GitLab, but its commit and changes are missing from the target branch. Additionally, the commit from the second MR incorrectly references both the second and third MRs as its parents, even though this commit was not part of the third MR.
What is the expected correct behavior?
Either all MRs should be properly merged with their commits included in the target branch, or if a merge fails, the MR should be marked as failed rather than incorrectly showing as "merged" while its changes are missing.
What the system needs to wait for but doesn't properly synchronize:
- MR2's complete merging process to finish before creating MR3's pipeline
- The target branch to be updated with MR2's changes before processing MR3
- Verification that MR3's commit is compatible with the new target branch state
Relevant logs and/or screenshots
When this occurs, the git history of the target branch shows that the commit from the third MR is missing, despite the UI showing it as merged. The system does not generate any error logs about this condition, making it a particularly dangerous issue.
Flowchart:
flowchart TD
CR2("Create Ref for MR2 based on MR1\nee/app/services/merge_trains/create_ref_service.rb") --> CP2("Create Pipeline for MR2\nee/app/services/merge_trains/create_pipeline_service.rb")
CP2 --> MR3("MR3 Added to Train")
subgraph "Race Condition Zone"
direction TB
MR3 --> CR3("Create Ref for MR3 based on MR2\nee/app/services/merge_trains/create_ref_service.rb\n@merge_request.merge_params['train_ref'] = create_ref_result...")
CP2 --> PipeRun2("MR2's Pipeline Runs")
PipeRun2 --> Success2{"Pipeline Succeeds?"}
Success2 -->|Yes| MergeStart2("Begin Merging MR2\nee/app/services/auto_merge/merge_train_service.rb")
MergeStart2 --> RefCheck2("Check source_sha in target branch\nee/app/services/merge_requests/merge_strategies/from_train_ref.rb\nsource_sha != repository.commit(...)")
CR3 --> CP3("Create Pipeline for MR3\nee/app/services/merge_trains/create_pipeline_service.rb\ncreate_train_ref(merge_request, previous_ref, create_mergeable_ref)")
RefCheck2 -->|Race!| FF2("Fast Forward Merge of MR2\nee/app/services/merge_requests/merge_strategies/from_train_ref.rb\nrepository.ff_merge(current_user, source_sha, ...)")
FF2 --> MarkMerged2("Mark MR2 as Merged\napp/services/merge_requests/post_merge_service.rb\nmerge_request.mark_as_merged")
CP3 -->|Timing Issue| PipeRun3Wrong("MR3 Pipeline Created with MR2's commit\nee/app/services/merge_trains/create_pipeline_service.rb")
MarkMerged2 --> RefreshMT["Refresh Merge Train\nee/app/services/merge_trains/refresh_service.rb\n(Calls RefreshWorker)"]
subgraph "Deduplication Zone - Only Handles One Part"
direction TB
RefreshMT --> RefreshWorker["MergeTrains::RefreshWorker\nee/app/workers/merge_trains/refresh_worker.rb\ndeduplicate :until_executed, if_deduplicated: :reschedule_once"]
RefreshWorker --> RefreshMR3["Refresh MR3\nee/app/services/merge_trains/refresh_merge_request_service.rb\nmergeable_sha_and_message?(car)"]
end
RefreshMR3 --> MarkMerged3("Mark MR3 as Merged despite commit missing\napp/services/merge_requests/post_merge_service.rb\nmerge_request.mark_as_merged")
end
MarkMerged3 --> FinalState("Target Branch Missing MR3's Changes")
classDef normal fill:#e0f7fa,stroke:#00acc1,stroke-width:2px;
classDef raceCondition fill:#ffebee,stroke:#f44336,stroke-width:2px;
classDef deduplication fill:#e8f5e9,stroke:#43a047,stroke-width:2px,stroke-dasharray: 5 5;
class CR2,CP2,MR3 normal;
class CR3,PipeRun2,Success2,MergeStart2,RefCheck2,CP3,FF2,MarkMerged2,PipeRun3Wrong,RefreshMT,MarkMerged3,FinalState raceCondition;
class RefreshWorker,RefreshMR3 deduplication;
Output of checks
This bug can happen on both self-managed instances and GitLab.com.
Possible fixes
The race condition appears to be in the merge train implementation, specifically in how fast forward merge trains handle commit tracking and merging. Based on code analysis, the issue may lie in the following components:
The race condition appears to be in the merge train implementation, specifically in how fast forward merge trains handle commit tracking and merging. Based on code analysis, the issue may lie in the following components:
-
In
ee/app/services/merge_trains/create_ref_service.rb
, the race occurs when the third MR's ref is created while the second MR is still being processed:def update_merge_params_train_ref(create_ref_result) @merge_request.merge_params['train_ref'] = create_ref_result .payload .slice(:commit_sha, :merge_commit_sha, :squash_commit_sha) .stringify_keys @merge_request.save end
-
In
ee/app/services/merge_requests/merge_strategies/from_train_ref.rb
, theoutdated_source_sha?
validation check is insufficient when operations happen in quick succession:def outdated_source_sha? # This guard against races where the train ref is updated by another # process during the merge. This complements the mergability check, # which should cover changes in the source branch. source_sha != repository.commit(merge_request.train_ref_path)&.sha || source_sha != train_ref_merge_params[:commit_sha] end
-
In
ee/app/services/merge_trains/refresh_service.rb
, there's inadequate synchronization between merge operations and the train refresh process:# This class is to refresh all merge requests on the merge train that # the given merge request belongs to. # # It performs a sequential update on all merge requests on the train. # Multiple runs with the same project and branch should not take place concurrently # NOTE: To prevent concurrent refreshes, `MergeTrains::RefreshWorker` implements a locking mechanism through the # `deduplicate :until_executed, if_deduplicated: :reschedule_once` option within the worker
The refresh service implements worker-level locking to prevent concurrent refresh operations, but this is inadequate for synchronizing the entire merge train workflow. The race condition occurs between separate asynchronous processes (merging of MR2, pipeline creation for MR3, and refreshing the merge train), where the refresh service only synchronizes one part of this complex interaction chain, leaving the system vulnerable to race conditions when operations happen in rapid succession.
The fix would likely involve:
- Adding proper synchronization mechanisms to ensure atomic operations
- Implementing verification that a merge request's commit is actually in the target branch before marking it as merged
- Creating a dedicated service for handling fast forward merge train operations that maintains integrity of the commit history