From 68343305d859b683d952f6516dbd5a91db79bdf4 Mon Sep 17 00:00:00 2001 From: Karthik Nayak Date: Fri, 25 Jul 2025 11:01:59 +0200 Subject: [PATCH] doc: Document the reftable rollout process To rollout reftables to `.com`, we need to ensure that all the required information is aggregated in one spot. Add a document to address the rollout process and also to talk about different scenarios and how to handle them. --- doc/reftable_rollout.md | 94 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) create mode 100644 doc/reftable_rollout.md diff --git a/doc/reftable_rollout.md b/doc/reftable_rollout.md new file mode 100644 index 00000000000..fce41b84b63 --- /dev/null +++ b/doc/reftable_rollout.md @@ -0,0 +1,94 @@ +# Reftable rollout process + +The [reftable backend](https://about.gitlab.com/blog/whats-new-in-git-2-45-0/#reftables-a-new-backend-for-storing-references) is a new reference backend in Git. This is an alternative to the traditional files backend. Gitaly has been using the files backend and currently we plan to move to the new reftable backend. + +### Prerequisites for reftables + +The reftable rollout is currently dependent on the WAL rollout since the migration code/logic works with transactions. As such, we can only rollout reftables with the WAL. Also reftables inherently do not work with Praefect, since each reference update adds a new table with random filenames, this leads to mismatch in Praefect's voting mechanism. + +## Rollout to staging + +Reftables are already rolled out to staging with the opportunistic migrator specified below. As such most of the active repositories on staging are already using reftables. We have not seen any significant issues with using reftables on staging, giving us confidence to rollout to production. + +## Rollout to .com + +While talking about the rollout, it is essential to understand that there are two different distinct flows we need to target: + +1. New repositories +2. Existing repositories + +The difference being that new repositories do not have any existing state and there is no migration required, while for existing repositories, we need to migrate the state. + +### New Repositories + +All new repositories created in Gitaly are either done via `git-clone(1)` or via `git-init(1)`. Both of these commands take a `--ref-format=` argument which can dictate the reference backend used. + +To use reftable for new repositories we have to enable the `gitaly_new_repo_reftable_backend` featureflag. + +### Existing Repositories + +Migration with existing repositories are a little more complicated. Git provides a `git-refs-migrate(1)` command for migration of repositories from one backend to another. However, we need to ensure that this doesn't conflict with other incoming writes to the repository. Specifically in large repositories where migration can take a long time. + +Since the WAL provides exclusive snapshotting with conflict handling, we use this to provide mechanism to migrate existing repositories. + +#### MigrateReferenceBackend RPC + +To migrate a specific repository, we provide the `MigrateReferenceBackend` RPC. This RPC migrates a specified repository. This is beneficial to migrate single repositories for testing and to revert them back in case of any performance degradation observed. + +We can find and migrate repositories from the rails console as follows: + +```ruby +project = Project.find_by_full_path('gitlab-org/gitlab-test') +gitaly_repository_client = Gitlab::GitalyClient::RepositoryService.new(project.repository) + +# Will print the current reference backend used in Git +gitaly_repository_client.repository_info + +# To migrate to the Files backend. +gitaly_repository_client.migrate_reference_backend + +# To migrate to the reftable backend. +gitaly_repository_client.migrate_reference_backend(to_reftable: true) +``` + +#### Opportunistic Migrator + +While migration of single repositories is a quick way to test the reftable backend. For wider rollout, we need a more automated process. The opportunistic migrator adds a middleware to all incoming RFCs and initiates a migration when: + +1. There is an incoming read request +2. There is a finishing write request + +A background goroutine picks up new migrations to perform one at a time and performs them in a new/separate transaction. This ensures that we do not interfere with the RPC itself. The migrator also cancels any ongoing migrations for a repository if there is a new write request for the repository. This ensures we try and avoid as much writes conflict as possible. + +To enable the opportunistic migrator, use the `gitaly_reftable_migration` featureflag. + +#### Forced migration + +The opportunistic migrator _attempts_ to migrate repositories but doesn't guarantee migration of all repositories. Once we gain enough confidene with reftables, we would want to force the remaining repositories to be migrated by force. This can be done by adding a blocking migrator. + +The WAL provides a framework for adding such migrations, which will be run before any other request for the repository is proceessed. We can utilize this framework and plugin our existing migration code to ensure that all repositories will be converted to using reftables. + +## Metrics and Logging + +* [Grafana Dashboard](https://dashboards.gitlab.net/d/ce1mnfe77u9s0f/reftable-rollout?from=now-30m&orgId=1&timezone=browser&to=now&var-PROMETHEUS_DS=mimir-gitlab-gstg&var-stage=main). +* [Logs for Opportunisitc Migrator](https://nonprod-log.gitlab.net/app/r/s/iu5RU) + +## Reverting + +#### Revert single repository + +For reverting a single repository we can use the `MigrateReferenceBackend` to simply target a given repository and migrate to the files backend. + +#### Reverting Multiple repositories + +* We can use still use the same approach as the single repository for multiple repository migration. +* If there is a repository corruption on a large scale, we'll have to restore from backup or tend to a more case-by-case fix. A generic revert solution will not apply here. +* With the precursor that reftables rollout will follow the WAL. We should ensure that the WAL is already stable on a given node, before migration of repositories on that node to reftables. In such a case that we forsee a rollback of the WAL, we would need to add custom logic to revert repositories to the files backend. This can follow the recovery logic of the WAL in `internal/gitaly/storage/storagemgr/middleware_recovery.go`. + +## Dedicated + +TBA + +## Self-Hosted + +TBA \ No newline at end of file -- GitLab