[go: up one dir, main page]

Skip to content

Draft: Replicate Git repos from project_repositories table instead of projects

What does this MR do and why?

Overview

This MR adds a new version of replication for project repositories behind an FF, such that it enumerates the project_repositories table instead of the projects table for Git repository verification and replication. This change ensures that only projects with actual Git repositories are tracked, significantly reducing errors when Gitaly attempts to fetch non-existent repositories.

The implementation plan

Explained in &17974 (comment 2601143198):

We will now have 2 versions of project repo replication behind FFs, until we're confident the new one works in all scenarios and the legacy one can be retired.

Let the current project repo replication be called legacy project repo replication for the sake of clarity in discussions, behind the FF geo_project_repository_replication.

This MR has the following goals:

  1. Introduce project repo replication v2 behind the FF geo_project_repository_replication_v2. geo_project_repository_replication enabled project repo replication using the legacy replication and geo_project_repository_replication_v2 enables the v2 replication. geo_project_repository_replication_v2 has no effect if geo_project_repository_replication is off.
  2. Use the same Registry class for both ProjectRepositoryRegistry so we don't duplicate replication for projects when switching v2 replication on. This needs to be tested further. The registry will contain a project_id and a new project_repository_id as foreign keys.
  3. Programmatically test out both code paths, separately and when v2 is enabled after legacy replication.
  • What happens when the registry already has records?
  • how to ensure repos are not replicated twice for the same project?
  • once v2 is enabled, use v2 for each existing registry entries with project_id, i.e. those that were created using the legacy replication
  • ensure the results in UI are consistent, i.e. when 20 project repos exist, exactly those show up as verified/replicated
  1. Manual testing plan here: Geo: Test project repository replication v2 (#557238)

While this approach is complicated, I decided to not create a new registry, because, in that case, once v2 is enabled, all projects would have to be resynced, which in large instances will take a long time.

Another reason is that the way our Self service framework works, many class names are derived from the model class name - this could potentially be worked around.

I want to make sure the switch between both replication versions is seamless.

Also, we might want to switch off v2 after having used it, in that case it should be possible to switch the registry to use project_id as the main foreign key instead of project_repository_id.

We won't be adding new fields to doc/api/geo_nodes.md like project_repositories_count based on replication used, to keep things simple and reflect the current status regardless of the replication choice.

After this MR:

Related issues: Geo: Enumerate project_repositories instead of ... (#546175) Geo: Insert a record in project_repositories on... (#546176) Geo: Do not attempt to verify/replicate Git rep... (#546179 - closed)

References

Screenshots or screen recordings

Before After

How to set up and validate locally

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Aakriti Gupta

Merge request reports

Loading