[go: up one dir, main page]

Skip to content

Introduce FF read_from_new_ci_destinations and update read logic for options/yaml_variables

What does this MR do and why?

Context

As part of our CI data reduction project (Reduce growth rate of ci_builds_metadata partit... (&13886)), we introduced a new table ci_job_definitions to help deduplicate job data. Since some data is moved to new destinations (e.g. intrinsic data moved to ci_builds and immutable processing data to ci_job_definitions) we need to be able to use feature flags to read/write data to specific destination.

In #552057 (closed), we ultimately want to move all necessary getter/setters to Ci::Metadatable so they're all in one place and the behaviour can be switched on/off based on feature flags.

This MR

These changes are based on the POC: !193948 (closed)

Since there are many columns/fields to migrate, I've decided to split this effort into multiple smaller MRs to simplify the review process. This MR is the first iteration of the implementation plan:

  1. Introduces the Feature flag:
  1. Applies the FF-controlled reading logic to the underlying methods for options and yaml_variables.
  2. Updates all places that preload :metadata so that it also preloads :job_definition, to avoid N+1 queries.

References

Database queries

NOTES:

  • We can't run these queries on Postgres.ai yet because the clone is delayed and the job_definitions table doesn't exist. So the query plans below are from local machine.
  • There isn't any job definition data yet, so the query plans prefer just seq scans for now. Later the queries should perform fine though because we're only filtering on each table's primary key.

Preloading only [:job_definition] association

Raw SQL
SELECT
    "p_ci_job_definition_instances"."job_id" AS t0_r0,
    "p_ci_job_definition_instances"."job_definition_id" AS t0_r1,
    "p_ci_job_definition_instances"."partition_id" AS t0_r2,
    "p_ci_job_definition_instances"."project_id" AS t0_r3,
    "p_ci_job_definitions"."id" AS t1_r0,
    "p_ci_job_definitions"."partition_id" AS t1_r1,
    "p_ci_job_definitions"."project_id" AS t1_r2,
    "p_ci_job_definitions"."created_at" AS t1_r3,
    "p_ci_job_definitions"."updated_at" AS t1_r4,
    "p_ci_job_definitions"."interruptible" AS t1_r5,
    "p_ci_job_definitions"."checksum" AS t1_r6,
    "p_ci_job_definitions"."config" AS t1_r7
FROM
    "p_ci_job_definition_instances"
    LEFT OUTER JOIN "p_ci_job_definitions" ON "p_ci_job_definitions"."partition_id" IS NOT NULL
    AND "p_ci_job_definitions"."id" = "p_ci_job_definition_instances"."job_definition_id"
    AND "p_ci_job_definitions"."partition_id" = "p_ci_job_definition_instances"."partition_id"
WHERE
    "p_ci_job_definition_instances"."partition_id" = 100
    AND "p_ci_job_definitions"."partition_id" = 100
    AND "p_ci_job_definition_instances"."job_id" IN (1422, 1421, 1420)
Query plan
 Nested Loop  (cost=0.00..0.01 rows=1 width=137) (actual time=0.002..0.002 rows=0 loops=1)
   Join Filter: (p_ci_job_definition_instances.job_definition_id = p_ci_job_definitions.id)
   ->  Seq Scan on ci_job_definition_instances_100 p_ci_job_definition_instances  (cost=0.00..0.00 rows=1 width=32) (actual time=0.002..0.002 rows=0 loops=1)
         Filter: ((partition_id = 100) AND (job_id = ANY ('{1422,1421,1420}'::bigint[])))
   ->  Seq Scan on ci_job_definitions_100 p_ci_job_definitions  (cost=0.00..0.00 rows=1 width=105) (never executed)
         Filter: ((partition_id IS NOT NULL) AND (partition_id = 100))
 Planning:
   Buffers: shared hit=6
 Planning Time: 0.150 ms
 Execution Time: 0.011 ms

Preloading [:job_definition_instance, :job_definition] separately

SELECT
    "p_ci_job_definition_instances".*
FROM
    "p_ci_job_definition_instances"
WHERE
    "p_ci_job_definition_instances"."partition_id" = 100
    AND "p_ci_job_definition_instances"."job_id" IN (1422, 1421, 1420)
Query plan
 Seq Scan on ci_job_definition_instances_100 p_ci_job_definition_instances  (cost=0.00..0.00 rows=1 width=32) (actual time=0.002..0.002 rows=0 loops=1)
   Filter: ((partition_id = 100) AND (job_id = ANY ('{1422,1421,1420}'::bigint[])))
 Planning:
   Buffers: shared hit=3
 Planning Time: 0.071 ms
 Execution Time: 0.007 ms
SELECT
    "p_ci_job_definitions".*
FROM
    "p_ci_job_definitions"
WHERE
    "p_ci_job_definitions"."partition_id" = 100
    AND "p_ci_job_definitions"."id" IN (1004, 1005, 1006)
Query plan
 Seq Scan on ci_job_definitions_100 p_ci_job_definitions  (cost=0.00..0.00 rows=1 width=105) (actual time=0.011..0.012 rows=0 loops=1)
   Filter: ((partition_id = 100) AND (id = ANY ('{1004,1005,1006}'::bigint[])))
 Planning:
   Buffers: shared hit=3
 Planning Time: 0.610 ms
 Execution Time: 0.032 ms

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #552057 (closed)

Edited by Leaminn Ma

Merge request reports

Loading