[go: up one dir, main page]

Skip to content

Introduce gvltools to measure sidekiq gvl duration

What does this MR do and why?

In gitlab-com/gl-infra/data-access/durability/team#289 (comment 2814450208), we hypothesized that thread contentions from GVL could cause threads with DB connections to wait longer and therefore contributed to pgbouncer saturation.

We currently don't have reliable metrics to measure the impact from GVL.

This MR adds gvltools gem to measure threads' GVL wait time. By having this metric, we should be able to prove if thread contention was indeed a contributing factor of pgbouncer saturation and thus sidekiq_queueing apdex drops.

The measurement is behind a feature flag with pod actors, and it's not supposed to be enabled for the whole fleet due to 1-5% overhead https://github.com/Shopify/gvltools?tab=readme-ov-file#usage:

Note that using the GVL instrumentation API adds some overhead each time Ruby has to switch threads. In production it is recommended to only enable it on a subset of processes or for small periods of time as to not impact the average latency too much.

The exact overhead is not yet known, early testing showed a 1-5% slowdown on a fully saturated multi-threaded app.

References

Sidekiq version of !142419

Also from gitlab-com/runbooks!9566 (comment 2822828374)

Screenshots or screen recordings

Before After

How to set up and validate locally

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Marco Gregorius

Merge request reports

Loading