[go: up one dir, main page]

Skip to content

Backup CronJob "safe-to-evict=false" autoscaler annotation preventing scheduling in GKE Autopilot

Summary

Our backup CronJob is currently hardcoded with cluster-autoscaler.kubernetes.io/safe-to-evict: "false" which results in a FailedCreate in GKE Autopilot clusters with the following error:

  Warning  FailedCreate  2m3s (x7 over 12m)  job-controller  Error creating: admission webhook "policycontrollerv2.common-webhooks.networking.gke.io" denied the request: GKE Policy Controller rejected the request because it violates one or more policies: {"[denied by autogke-node-affinity-selector-limitation]":["Auto GKE disallows use of cluster-autoscaler.kubernetes.io/safe-to-evict=false annotation on workloads"]}

Steps to reproduce

  1. Create a GKE Autopilot cluster
  2. Install the GitLab Helm Chart with the backup cron enabled (using every 5 minutes to trigger the failure fast)
gitlab:
  toolbox:
    backups:
      cron:
        enabled: true
        schedule: "5 * * * *"
  1. Observe the FailedCreate events for the gitlab-toolbox-backup job.

Discussion

Having GKE evict the backup pod seems problematic, and it makes sense that the Job pod is marked safe-to-evict=false.

However, depending on the cluster configuration, this may be a low probability event and might be worth the risk here.

Given that we also have blogged about using the Helm Chart with Autopilot and that this pod is the only one in the chart with an explicit cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation - would it make sense to add something like a gitlab.toolbox.backups.cron.allowEviction conditional to leave out the annotation so that it can be scheduled in Autopilot?

Edited by Jason Young