Proposal: Deduplicate jobs's immutable metadata
📣 this issue is not being implemented in CI storage optimization - Phase 2: write to new... (&18268) 📣
Problem
p_ci_builds_metadata is growing very fast. A lot of the data persisted in this table is identical across job runs and pipelines.
Proposal
- Move job processing and immutable data from
p_ci_builds_metadatainto a new tablep_ci_job_prototypes. - Mutable and processing data is moved to a new
p_ci_job_processingtable and deleted after job completes. - Move job intrinsic and mutable data from
p_ci_builds_metadatainto a newp_ci_builds
| Mutable? | Intrinsic/processing | Destination |
|---|---|---|
| No | Intrinsic | p_ci_builds |
| Yes | Intrinsic | p_ci_builds |
| No | Processing | p_ci_job_prototypes |
| Yes | Processing | p_ci_job_processing |
Analysis of the data in p_ci_builds_metadata is currently being done in Ensure ci_builds_metadata contains only process... (#271615 - closed).
Goal: essentially we won't be creating new records into p_ci_builds_metadata
Approach
- Create new table
p_ci_job_prototypesto only store immutable and processing data currently inp_ci_builds_metadata - Add new columns in
p_ci_buildsfor intrinsic mutable data currently inp_ci_builds_metadata - Create new table
p_ci_job_processingto only store mutable processing data currently inp_ci_builds_metadata. - When jobs are created
- immutable data is persisted in
p_ci_job_prototypesin a deduplicated way.- Equivalent data in
p_ci_builds_metadatais left NULL.
- Equivalent data in
- intrinsic mutable data is persisted in
p_ci_builds - processing data created during the job processing is stored in
p_ci_job_processing
- immutable data is persisted in
- Add application code that prefers reading immutable data from the prototypes, defaulting to metadata otherwise.
Modeling:
- A
ci_build belongs_to: ci_job_prototype - A
ci_job_prototype has_many: ci_builds - A
ci_build has_one: ci_job_processing
Migration of existing data from p_ci_builds_metadata partitions:
- Insert immutable data into
p_ci_job_prototypes - Insert intrinsic mutable data into
p_ci_builds - Truncate
p_ci_builds_metadatatable (on Gitlab.com we will be truncating each migrated partition)
Edited by Fabio Pitino