Make it easier to verify and deploy memory settings
Problem
In #289838 (closed) we unsuccessfully attempted to tweak the memory allocator and Ruby GC of the gitlab-rails
Ruby service. The major problems we ran into were:
- Establishing a reliable baseline against which to gauge success is hard.
- The performance test suites we run (via
gpt
) did not end up reflecting how SaaS responded to the same changes in production. - It took a very long time to even have something we can test in production, and the change management process for rolling these out is slow and involves tight coordination between Memory and Infrastructure engineers.
Suggestion
In order to both increase our confidence in making such changes and reducing the time and effort it takes to verify them in production, we should consider:
- Having a performance test suite that better reflects production behavior, or
- Having the means to easily experiment with runtime settings in production without introducing extra risk and without involving engineers from other teams (there are some suggestions for how that could work in gitlab-com/gl-infra/scalability#154)
Edited by 🤖 GitLab Bot 🤖