[go: up one dir, main page]

Skip to content

Proposal: Deduplicate jobs's immutable metadata

📣 this issue is not being implemented in CI storage optimization - Phase 2: write to new... (&18268) 📣

Problem

p_ci_builds_metadata is growing very fast. A lot of the data persisted in this table is identical across job runs and pipelines.

Proposal

  • Move job processing and immutable data from p_ci_builds_metadata into a new table p_ci_job_prototypes.
  • Mutable and processing data is moved to a new p_ci_job_processing table and deleted after job completes.
  • Move job intrinsic and mutable data from p_ci_builds_metadata into a new p_ci_builds
Mutable? Intrinsic/processing Destination
No Intrinsic p_ci_builds
Yes Intrinsic p_ci_builds
No Processing p_ci_job_prototypes
Yes Processing p_ci_job_processing

Analysis of the data in p_ci_builds_metadata is currently being done in Ensure ci_builds_metadata contains only process... (#271615 - closed).

Goal: essentially we won't be creating new records into p_ci_builds_metadata

Approach

  1. Create new table p_ci_job_prototypes to only store immutable and processing data currently in p_ci_builds_metadata
  2. Add new columns in p_ci_builds for intrinsic mutable data currently in p_ci_builds_metadata
  3. Create new table p_ci_job_processing to only store mutable processing data currently in p_ci_builds_metadata.
  4. When jobs are created
    1. immutable data is persisted in p_ci_job_prototypes in a deduplicated way.
      1. Equivalent data in p_ci_builds_metadata is left NULL.
    2. intrinsic mutable data is persisted in p_ci_builds
    3. processing data created during the job processing is stored in p_ci_job_processing
  5. Add application code that prefers reading immutable data from the prototypes, defaulting to metadata otherwise.

Modeling:

  • A ci_build belongs_to: ci_job_prototype
  • A ci_job_prototype has_many: ci_builds
  • A ci_build has_one: ci_job_processing

Migration of existing data from p_ci_builds_metadata partitions:

  1. Insert immutable data into p_ci_job_prototypes
  2. Insert intrinsic mutable data into p_ci_builds
  3. Truncate p_ci_builds_metadata table (on Gitlab.com we will be truncating each migrated partition)
Edited by Fabio Pitino