[go: up one dir, main page]

Skip to content

CreateFork is slow and duplicates all objects

CreateFork has always used --no-local (#817 (closed)) to clone from any internal repository. From https://www.git-scm.com/docs/git-clone:

When the repository to clone from is on a local machine, this flag bypasses the normal "Git aware" transport mechanism and clones the repository by making a copy of HEAD and everything under objects and refs directories. The files under .git/objects/ directory are hardlinked to save space when possible.

I think this was done because this makes it easy to generalizing clone a repository from any shard, and it potentially avoids copying over cruft in the main repo.

However, the problem is that it's very slow. On my test instance with gitlab-or/gitlab, notice it takes 2 minutes to complete:

# time git clone --bare --no-local /var/opt/gitlab/git-data/repositories/@hashed/c2/35/c2356069e9d1e79ca924378153cfbbfb4d4416b1f99d41a2940bfdb66c5319db.git /tmp/no-local.git
Cloning into bare repository '/tmp/no-local.git'...
remote: Counting objects: 2660817, done.
remote: Total 2660817 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (2660817/2660817), 1.04 GiB | 27.21 MiB/s, done.
Resolving deltas: 100% (2062629/2062629), done.

real    2m16.601s
user    2m17.052s
sys     0m12.271s

Compare that with a --local and --no-hardlinks clone with a deduped repo:

# time git clone --bare --local --no-hardlinks /var/opt/gitlab/git-data/repositories/@hashed/c2/35/c2356069e9d1e79ca924378153cfbbfb4d4416b1f99d41a2940bfdb66c5319db.git /tmp/local.git
Cloning into bare repository '/tmp/local.git'...
done.

real    0m0.221s
user    0m0.106s
sys     0m0.108s

It turns out the local copy also brings in the objects/info/alternates, which makes the clone fast because it doesn't have to copy everything from the pool repository. I thought !2887 (merged) would do the same thing, but --no-local basically negates that.

Proposal:

  1. If the source repo is on the same shard as the target repo, use --local.
  2. Introduce a boolean field to enable this behavior.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information