Devise a benchmark that simulates production workloads

Based on an investigation into the CNY Gitaly node's latency spikes, there are two bottlenecks. One is related to jdb2 contention, and the other is related to inode contention.

We have a brainstorming issues for inode contention with different ideas. However, we need a way to test out these ideas and quantify how much they reduce the bottlenecks without risking a production deployment each time.

We need to find a way to benchmark Gitaly Transactions in a way that re-creates the conditions that lead to the performance bottlenecks we've seen in production. This shortened feedback loop will allow us to try out different ideas quickly.

The current limitation to GPT is that it doesn't simluate reads and writes simulutaneously.

Plan

We will reuse the existing Gitaly benchmarking tool as it's more fit-for-purpose than GPT or running k6 on its own. The tool currently doesn't automate the process of setting up test repositories, and only contains a single benchmark definition for the FindCommit RPC against the git source.

We need to:

Automate the process of seeding test repositories and creating the git-repos disk image on GCP. This allows us to easily modify the list of repositories as we attempt to find a good mix. !8027 (merged) tackles this.
Allow Gitaly transactions to be toggled.
Allow benchmarks to be executed concurrently across multiple Gitaly VMs with different configurations.
Execute a concurrent mix of RPCs against different repositories on the Gitaly host. Based on the production logs we extracted here, we need to implement k6 functions to call the following RPCs:

Implement WriteRef requests
Implement DeleteRefs requests
Implement ListCommitsByOid requests
Implement GetBlobs requests
Implement GetTreeEntries requests
Implement TreeEntry requests

For each repository we intend on testing against, we need to extract a set of test data that can be used in the RPC requests. These would be OIDs for GetBlobs calls, OIDs for ListCommitsByOid, and a pool of randomly generated references for WriteRef and DeleteRefs calls for example.
Use a bpftrace script to instrument off-CPU time the Gitaly VM to identify if we're hitting similar bottlenecks as we did in cny.

Edited Sep 09, 2025 by James Liu

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information