Ensure ci_builds_metadata contains only processing data
Overview
ci_builds_metadata currently contains many columns. We should ensure that data in all of them are safe to get deleted after a build gets archived.
We have work in-progress to clear data from some of the columns discussion: Delete Ci::BuildMetadata after Ci::... (#538031 - closed) which will help with space savings, but there's a decent amount of operational efficiency to be gained if we can simply delete the row, instead of updating a few columns ot null.
Columns
| Column | Type | Nullable? | Mutable? |
Removable on archive? #538031 (closed) |
Where to move? |
|---|---|---|---|---|---|
| id | integer | no | N/A |
|
N/A |
| build_id | integer | no | N/A (we may deduplicate immutable data across builds) |
|
N/A |
| project_id | integer | no | N/A |
|
N/A |
| partition_id | integer | no | N/A |
|
N/A |
| timeout | integer | yes | yes (when job picked by runner) |
|
|
| timeout_source | integer | yes | yes (when job picked by runner) |
|
|
| interruptible | boolean |
no (default |
no |
|
|
| config_options | jsonb | yes | ? |
|
|
| config_variables | jsonb | yes | no |
|
|
| has_exposed_artifacts | boolean |
yes. We care if it's |
no |
|
|
| environment_auto_stop_in | character varying(255) |
|
!194402 (closed) being moved to |
||
| expanded_environment_name | character varying(255) | yes | no |
|
|
| secrets | jsonb | yes | no |
|
|
| id_tokens | jsonb | yes | no |
|
|
| debug_trace_enabled | boolean |
no (default |
yes |
|
|
| exit_code | smallint | yes | yes |
|
|
Top-level keys found in config_options
As of 2025-05-23:
[ gprd ] production> Ci::BuildMetadata.select(:config_options).last(300_000).flat_map { |md| md.config_options.keys }.uniq.sort
NOTE: Ideally intrinsic data should be moved to a table that best represents the data. However, due to urgency, we could introduce a column in p_ci_builds that is nullable and not indexed. For example if artifacts:expose_as is intrinsic data (non processing), we could introduce p_ci_builds.artifacts_expose_as as jsonb and move the data in there when pipeline is archived or new jobs created.
| Top-level key | Nullable? | Mutable? | Removable on archive? | Where to move? |
|---|---|---|---|---|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
@fabiopitino: this should be considered intrinsic data. Consider creating a dedicated table given the low usage of this feature which may help us deprecating it if needed. Alternatively, if stored in |
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | yes |
|
|
|
|
yes | yes |
|
Moving to Redis |
|
|
yes | |||
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes |
|
||
|
|
yes |
|
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes |
Likely isn't used after the release is created. See #545486 (comment 2547683632). |
||
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|
|
|
yes | no |
|
|