[go: up one dir, main page]

Skip to content

Migration framework marks a done state if a task is disabled by a FF

Raised from https://gitlab.slack.com/archives/C7XJA8FSQ/p1753812091063159.

We have a migration task registered here, currently controlled by a feature flag (FF).

After enabling the FF, we observed that the migration did not run as expected. On investigation, it appears that each migration is assigned a state with a doneCh channel. When a migration is registered but disabled via a FF, the corresponding state is initialized with a closed doneCh. As a result, the migration framework incorrectly considers the task as already completed, even though it was never actually executed.

A temporary workaround is to restart Gitaly, which refreshes the internal state map and allows the migration to run.

To implement, we can remove the task from the state map if it is disabled by the feature flag. This ensures the framework doesn’t incorrectly mark the task as done. The overhead would be check the disabled task every time a transaction start. A previous attempt MR is at !8048 (closed)

The difficulty here is to guarantee the migration task executes before other transactions.

A migration could be started while other prior transactions are in flight. This could potentially lead to conflicts and the migration transaction failing. Migration tasks are special tasks, e.g., leftover migration that deletes a lot of directories and files and will almost certainly lead to conflicts. That is why we want leftover cleanup as a migration in the first place: we want this kind of task to run before all other normal transactions can proceed.

When restarting Gitaly, this is not a problem, since restarting Gitaly will naturally run/cancel ongoing transactions. If we want to insert migration tasks while Gitaly is still running, we need to:

  1. Wait for all in-flight transactions to finish (or forcibly cancel them) and block new transactions from being scheduled;
  2. Run migration while blocking other transactions at the same time.

Step 1 is the key and needs careful design.

Edited by Eric Ju
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information