Auto cancel of a Child Pipeline in a Merge Train intermittently not successful
Summary
When a job fails in a merge train, the remaining pipelines are canceled. But sometimes, a child pipeline is not canceled and continues to run. Exception: Cannot transition status via :cancel from :canceled is seen.
Customer description: Our Project team facing an issue in GitLab CI/CD where cancellation of parent pipelines in a merge train does not propagate to their respective child pipelines. This behavior occurs specifically when using resource groups to prevent concurrent execution of pipelines. The issue leads to race conditions, concurrent job execution, and manual intervention to resolve locks in pipeline environments.
Issue for Closing a MR doesn't cancel pipelines that contain child pipelines is very similar and may be the same root cause.
Steps to reproduce
- Create the .gitlab-ci.yml & child.yml file in the root directory of the repo on some branch called branch (as an example)
- Create a new branch and add new files/changes to the repo to be merged (in my case i created file1.txt with contents foobar)
- I did this step 3 times… i.e. I created branch1, branch2, branch3 with file1.txt, file2.txt, file3.txtrespectively so that I had a queue of merge-train items.
- Create a merge request for branch1 to branch
- Create a merge request for branch2 to branch
- Create a merge request for branch3 to branch
- Click the merge button for each open Merge request.
- All pipelines will start.
- Branch1 pipeline will continue to generate child pipelines
- Branch2&3 will queue at the parent_trigger stage
- Branch1 will fail on child_pipeline2 when exit 1 is reached.
- Branch2&3 will cancel & restart, but branch2 will still generate the child pipelines despite cancellation and execute them.
######### .gitlab-ci.yml
stages:
- startup
- execute
workflow:
auto_cancel:
on_new_commit: interruptible
dummy_job:
stage: startup
script:
- exit 0;
rules:
- if: $CI_COMMIT_BRANCH
when: always
- when: never
parent_trigger:
interruptible: true
stage: execute
trigger:
include:
- local: child.yml
strategy: depend
resource_group: main
rules:
- if: $CI_MERGE_REQUEST_EVENT_TYPE == 'merge_train'
when: always
- when: never
######### child.yml
stages:
- execute
- validate
child_pipeline:
rules:
- if: $CI_PIPELINE_SOURCE == "parent_pipeline"
when: always
- when: never
interruptible: true
stage: execute
script:
- sleep 20;
- exit 0;
child_pipeline2:
rules:
- if: $CI_PIPELINE_SOURCE == "parent_pipeline"
when: always
- when: never
interruptible: true
stage: validate
script:
- sleep 20;
- exit 1;
Example Project
https://gitlab.com/lmacrae_ultimate_group/zd592371
What is the current bug behavior?
A pipeline fails in a merge train, automatic pipeline cancellation occurs for the remaining pipelines. But intermittently, child pipelines are not cancelled and run instead.
What is the expected correct behavior?
When a pipeline fails in a merge train, there should be automatic pipeline cancellation for the remaining pipelines and child pipelines in the merge train.
Relevant logs and/or screenshots
{
"severity": "ERROR",
"time": "2025-02-03T19:41:15.825Z",
"correlation_id": "01JK6N9G9HQPSFG09KNDQRP5ME",
"meta.caller_id": "Ci::PipelineBridgeStatusWorker",
"meta.remote_ip": "35.237.207.225",
"meta.feature_category": "continuous_integration",
"meta.user": "root",
"meta.user_id": 1,
"meta.project": "my-awesome-group/zd592371",
"meta.root_namespace": "my-awesome-group",
"meta.client_id": "user/1",
"meta.pipeline_id": 81,
"meta.job_id": 115,
"meta.root_caller_id": "PUT /api/:version/jobs/:id",
"exception.class": "Ci::Bridge::InvalidTransitionError",
"exception.message": "Cannot transition status via :cancel from :canceled (Reason(s): Status cannot transition via \"cancel\")",
"user.username": "root",
"tags.program": "sidekiq",
"tags.locale": "en",
"tags.feature_category": "continuous_integration",
"tags.correlation_id": "01JK6N9G9HQPSFG09KNDQRP5ME",
"extra.sidekiq": {
"retry": 3,
"queue": "default",
"version": 0,
"store": null,
"queue_namespace": "pipeline_default",
"args": [
"82"
],
"class": "Ci::PipelineBridgeStatusWorker",
"jid": "cd896e4baf1fffbed65d266b",
"created_at": 1738611675.5029118,
"correlation_id": "01JK6N9G9HQPSFG09KNDQRP5ME",
"meta.caller_id": "PipelineProcessWorker",
"meta.remote_ip": "35.237.207.225",
"meta.feature_category": "continuous_integration",
"meta.user": "root",
"meta.user_id": 1,
"meta.project": "my-awesome-group/zd592371",
"meta.root_namespace": "my-awesome-group",
"meta.client_id": "user/1",
"meta.pipeline_id": 81,
"meta.job_id": 115,
"meta.root_caller_id": "PUT /api/:version/jobs/:id",
"worker_data_consistency": "always",
"idempotency_key": "resque:gitlab:duplicate:default:12b8142b10303d4ad90b17031a9fa4f9cd2a8414e66cd3720a8fbd5d5645d8f4",
"size_limiter": "validated",
"enqueued_at": 1738611675.5232136
},
"extra.bridge_id": 111,
"extra.downstream_pipeline_id": 82
}
Output of checks
This bug happens on GitLab.com
Results of GitLab environment info
Customer version is 16.11.10. Reproduced on 16.11.10 1k archictecture
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: root@refarch-1k-env-6e12b860-gitlab-rails-1:/var/log/gitlab# sudo gitlab-rake gitlab:env:info System information System: Ubuntu 20.04 Proxy: no Current User: git Using RVM: no Ruby Version: 3.1.4p223 Gem Version: 3.5.7 Bundler Version:2.5.8 Rake Version: 13.0.6 Redis Version: 7.0.15 Sidekiq Version:7.1.6 Go Version: unknown GitLab information Version: 16.11.10-ee Revision: f6896a3182a Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 14.11 URL: https://refarch-1k-env-6e12b860.env-6e12b860.gcp.gitlabsandbox.net HTTP Clone URL: https://refarch-1k-env-6e12b860.env-6e12b860.gcp.gitlabsandbox.net/some-group/some-project.git SSH Clone URL: ssh://git@refarch-1k-env-6e12b860.env-6e12b860.gcp.gitlabsandbox.net:2222/some-group/some-project.git Elasticsearch: no Geo: no Using LDAP: no Using Omniauth: yes Omniauth Providers: GitLab Shell Version: 14.35.0 Repository storages: - default: unix:/var/opt/gitlab/gitaly/gitaly.socket GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Gitaly - default Address: unix:/var/opt/gitlab/gitaly/gitaly.socket - default Version: 16.11.10 - default Git Version: 2.43.5
Results of GitLab application Check
Expand for output related to the GitLab application check
root@refarch-1k-env-6e12b860-gitlab-rails-1:/var/log/gitlab# sudo gitlab-rake gitlab:check SANITIZE=true\` Checking GitLab subtasks ... Checking GitLab Shell ... GitLab Shell: ... GitLab Shell version \>= 14.35.0 ? ... OK (14.35.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful Checking GitLab Shell ... Finished Checking Gitaly ... Gitaly: ... default ... OK Checking Gitaly ... Finished Checking Sidekiq ... Sidekiq: ... Running? ... yes Number of Sidekiq processes (cluster/worker) ... 1/1 Checking Sidekiq ... Finished Checking Incoming Email ... Incoming Email: ... Reply by email is disabled in config/gitlab.yml Checking Incoming Email ... Finished Checking LDAP ... LDAP: ... LDAP is disabled in config/gitlab.yml Checking LDAP ... Finished Checking GitLab App ... Database config exists? ... yes Tables are truncated? ... skipped All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Cable config exists? ... yes Resque config exists? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet) Systemd unit files or init script exist? ... skipped (omnibus-gitlab has neither init script nor systemd units) Systemd unit files or init script up-to-date? ... skipped (omnibus-gitlab has neither init script nor systemd units) Projects have namespace: ... 2/1 ... yes Redis version \>= 6.2.14? ... yes Ruby version \>= 3.0.6 ? ... yes (3.1.4) Git user has default SSH configuration? ... yes Active users: ... 1 Is authorized keys file accessible? ... skipped (authorized keys not enabled) GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes Elasticsearch version 7.x-8.x or OpenSearch version 1.x ... skipped (Advanced Search is disabled) All migrations must be finished before doing a major upgrade ... skipped (Advanced Search is disabled) Checking GitLab App ... Finished Checking GitLab subtasks ... Finished