[go: up one dir, main page]

Skip to content

Site Reliability Engineering: Automated Restore Support

Summary

One of the core concepts to good Site Reliability Engineering is not just taking backups but restoring them and restoring them often. I see that the GitLab Helm Chart has a cronjob object which spins up a short-lived gitlab-toolbox pod to generate a backup file using GitLab's backup-utility binary.

This is all great and fine! Let's take it a step further!

Feature Request

What I'd like to request is a cronjob object which spins up a short-lived gitlab-toolbox pod to take in whatever the latest restore file present is, and use the backup-utility to attempt to restore it. The use case is of course a non-production environment to ensure that our restores are valid, useful, and helpful in verifying configuration changes for our use cases. We could then set alerts to read the max age within the non-production environment and alert on this system when age breaks a threshold.

Context

Edited by Kurt Bomya