Devise a benchmark that simulates production workloads
Based on an investigation into the CNY Gitaly node's latency spikes, there are two bottlenecks. One is related to jdb2 contention, and the other is related to inode contention.
We have a brainstorming issues for inode contention with different ideas. However, we need a way to test out these ideas and quantify how much they reduce the bottlenecks without risking a production deployment each time.
We need to find a way to benchmark Gitaly Transactions in a way that re-creates the conditions that lead to the performance bottlenecks we've seen in production. This shortened feedback loop will allow us to try out different ideas quickly.
The current limitation to GPT is that it doesn't simluate reads and writes simulutaneously.
Plan
We will reuse the existing Gitaly benchmarking tool as it's more fit-for-purpose than GPT or running k6 on its own. The tool currently doesn't automate the process of setting up test repositories, and only contains a single benchmark definition for the FindCommit
RPC against the git
source.
We need to:
-
Automate the process of seeding test repositories and creating the git-repos
disk image on GCP. This allows us to easily modify the list of repositories as we attempt to find a good mix. !8027 (merged) tackles this. -
Allow Gitaly transactions to be toggled. -
Allow benchmarks to be executed concurrently across multiple Gitaly VMs with different configurations. -
Execute a concurrent mix of RPCs against different repositories on the Gitaly host. Based on the production logs we extracted here, we need to implement k6 functions to call the following RPCs:
-
Implement WriteRef requests -
Implement DeleteRefs requests -
Implement ListCommitsByOid requests -
Implement GetBlobs requests -
Implement GetTreeEntries requests -
Implement TreeEntry requests
-
For each repository we intend on testing against, we need to extract a set of test data that can be used in the RPC requests. These would be OIDs for GetBlobs calls, OIDs for ListCommitsByOid, and a pool of randomly generated references for WriteRef and DeleteRefs calls for example. -
Use a bpftrace script to instrument off-CPU time the Gitaly VM to identify if we're hitting similar bottlenecks as we did in cny
.