[go: up one dir, main page]

Skip to content

Put in place measures to avoid addition/proliferation of GitLab upgrade path stops

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

This issue was created with contributions from @anton and @kballon.

Summary

The GitLab upgrade path is getting increasingly complex and arbitrary, creating a poor experience for system administrators responsible for maintaining their GitLab environments and causing unexpected and unplanned down/maintenance time for organisations and their end-users.

We should investigate the causes which introduce these seemingly arbitrary upgrade stops, and explore the measures we can take to reduce or eliminate these from being introduced in future versions.

Overview

In the past, GitLab major upgrades were straightforward, and could be summarised as follows:

  1. Upgrade to the latest minor version of the preceding major version.
  2. Upgrade to the "dot zero" release of the next major version (X.0.Z).

This was easy to understand, and upgrade stops in a multi-version upgrade path were predictable.

In the last three major versions, this is no longer the case, with multiple additional upgrade stops in seemingly arbitrary minor versions being mandated, otherwise an upgrade would be incomplete.

Version 11 to 12

0 additional stops: 11.0.Z -> 11.11.8 (last minor ver.) -> 12.0.12

Note: This is the last major version where we kept to the most straightforward upgrade path.

Version 12 to 13

1 addition stop: 12.0.12 -> 12.1 -> 12.10 (last minor ver.) -> 13.0.14

Expand for details on each stop
  • 12.1 was introduced as a stop in %13.6 because the ResetMergeStatus background migration needs to be able to access the merge_requests.state column in order to complete when triggered from 12.1+. This column was removed in 12.10 via this commit so if users upgraded directly from 12.0.Z -> 12.10.Z, the user would end up with a lot of stuck ResetMergeStatus background jobs (as per this issue) and would be blocked from upgrading to 13.X.
    1. Docs MR: !46632 (merged)
    2. Backport request (declined due to length of time passed): gitlab-org/release/tasks#1753

Version 13 to 14

2 addition stops: 13.0.14 -> 13.1.11 -> 13.8.8 -> 13.12.15 (last minor ver.) -> 14.0.12

Expand for details on each stop
  • 13.1 includes a Rails version change from 6.0.3 to 6.0.3.1 - it looks related to some security vulnerabilities within Rails . The Rails upgrade included a change to CSRF token generation which is not backwards-compatible - GitLab servers with the new Rails version generate CSRF tokens that are not recognizable by GitLab servers with the older Rails version - which could cause non-GET requests to fail for multi-node GitLab installations. It appears this stop was only included for multi node GitLab installations and it technically could be skipped for single node installs.
    1. Docs MR: !34116 (merged)
  • 13.8 includes a background migration to address an issue with duplicate service records. If duplicate services are present, this background migration must complete before a unique index is applied to the services table, which was introduced in GitLab 13.9. Upgrades from GitLab 13.8 and earlier to later versions must include an intermediate upgrade to GitLab 13.8.8 before proceeding.
    1. Docs MR: !68874 (merged)

Version 14 to 15

2 additional stops: 14.0.12 -> 14.3.6 -> 14.9.5 -> 14.10.Z (last minor ver.) -> 15.0.Z

Expand for details on each stop
  • 14.3 includes a batched background migration MigrateMergeRequestDiffCommitUsers. This migration might take hours or days to complete on larger GitLab instances. 14.5 foregrounded this migration, resulting in a number of large instances run into unexpected downtime beyond scheduled maintenance windows. This resulted in 14.3.6 being added to the upgrade path, but only ~7 months after release date.
    1. Docs MR: !89621 (merged)
    2. Additional context: #375553 (comment 1117399427)
  • 14.9 includes a batched background migration BackfillAllProjectNamespaces to ensure corresponding records in namespaces table for each record in projects table. 14.10 includes a batched background migration BackfillNamespaceIdForProjectRoutes dependent on the former migration being completed in full, resulting in 14.9 being a required upgrade stop.
    1. Note: This migration might take hours or days to complete on larger GitLab instances.
    2. Docs MR: !86029 (merged)

Version 15 to 16

1 additional stop so far: 15.0.Z -> 15.4.0 -> TBD

Expand for details on each stop

Review

The required upgrade stops fall into these categories:

  1. Code changes dependent on previous database schema changes was introduced (12.1, 13.8).
  2. Code changes dependent on data manipulation/consistency was introduced (14.9, 15.4).
    1. Potentially long-running background migration forced to foreground (14.3).
  3. Security fix (13.1) -- although it's unclear if this is a strongly required stop.

Potential causes

Description and discussion in comments.

Other notes

!89621 (comment 980598154)

  1. Since %15.0 we test all background migrations on our database testing pipelines using thin clones - this will help us identify similar issues early and before the migrations are released
  2. We have switched to using batched background migrations by default and we'll soon stop using regular background migrations - That will remove the issues with thousands or tens of thousands of jobs being queued but will not solve the issues with skipping versions
  3. We are now also inlining batched background migrations - that addresses issues with small/medium instances but we have to also continue the discussion on the timeouts.

Brainstorm ways for background migrations to be finalized without introducing a required upgrade step - #357561

Edited by 🤖 GitLab Bot 🤖