[go: up one dir, main page]

Skip to content

Backfill integrations and validate not null constraint

What does this MR do and why?

This MR backfills existing integration records that have both group/project_id and organization_id set. This causes validation of the multi column not null constraint to fail, because at most 1 value must be present. As investigated in #541026 (comment 2744480120) there are 0.05% of records that have two sharding keys set. Since there are only a handful of records to fix (and no records to fix on .com, since we don't have any instance wide integrations to inherit the settings from) I am adding 2 post deployment migrations and 2 temp indexes to go along for the duration of the backfills.

The db migrations:

  1. Create temp indexes to support backfill of records that have project/group_id & organization id set
  2. Backfill the records
  3. Remove temp indexes
  4. Validate multi column not null constraint

Query plans for backfills

Group integrations

Initial query

explain SELECT "integrations"."id" FROM "integrations" WHERE "integrations"."instance" = FALSE AND "integrations"."group_id" IS NOT NULL AND "integrations"."organization_id" IS NOT NULL ORDER BY "integrations"."id" ASC LIMIT 1

 Limit  (cost=0.12..2.35 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=1)
   Buffers: shared hit=1
   I/O Timings: read=0.000 write=0.000
   ->  Index Only Scan using tmp_idx_integrations_on_id_instance_group_id_organization_id on public.integrations  (cost=0.12..2.35 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=1)
         Heap Fetches: 0
         Buffers: shared hit=1
         I/O Timings: read=0.000 write=0.000
Settings: effective_cache_size = '472585MB', jit = 'off', random_page_cost = '1.5', seq_page_cost = '4', work_mem = '100MB'

Batch query

explain SELECT "integrations"."id" FROM "integrations" WHERE "integrations"."instance" = FALSE AND "integrations"."group_id" IS NOT NULL AND "integrations"."organization_id" IS NOT NULL AND "integrations"."id" >= 25 ORDER BY "integrations"."id" ASC LIMIT 1 OFFSET 50

Limit  (cost=2.35..4.58 rows=1 width=4) (actual time=0.009..0.010 rows=0 loops=1)
   Buffers: shared hit=2
   I/O Timings: read=0.000 write=0.000
   ->  Index Only Scan using tmp_idx_integrations_on_id_instance_group_id_organization_id on public.integrations  (cost=0.12..2.35 rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=1)
         Index Cond: (integrations.id >= 25)
         Heap Fetches: 0
         Buffers: shared hit=2
         I/O Timings: read=0.000 write=0.000
Settings: random_page_cost = '1.5', seq_page_cost = '4', work_mem = '100MB', effective_cache_size = '472585MB', jit = 'off'

Update query

explain UPDATE "integrations" SET "organization_id" = NULL WHERE "integrations"."instance" = FALSE AND "integrations"."group_id" IS NOT NULL AND "integrations"."organization_id" IS NOT NULL AND "integrations"."id" >= 25

 ModifyTable on public.integrations  (cost=0.12..2.35 rows=0 width=0) (actual time=0.015..0.016 rows=0 loops=1)
   Buffers: shared hit=2
   I/O Timings: read=0.000 write=0.000
   ->  Index Scan using tmp_idx_integrations_on_id_instance_group_id_organization_id on public.integrations  (cost=0.12..2.35 rows=1 width=14) (actual time=0.014..0.015 rows=0 loops=1)
         Index Cond: (integrations.id >= 25)
         Filter: ((NOT integrations.instance) AND (integrations.group_id IS NOT NULL) AND (integrations.organization_id IS NOT NULL))
         Rows Removed by Filter: 0
         Buffers: shared hit=2
         I/O Timings: read=0.000 write=0.000
Settings: random_page_cost = '1.5', seq_page_cost = '4', work_mem = '100MB', effective_cache_size = '472585MB', jit = 'off'
Project integrations

Initial query

explain SELECT "integrations"."id" FROM "integrations" WHERE "integrations"."instance" = FALSE AND "integrations"."project_id" IS NOT NULL AND "integrations"."organization_id" IS NOT NULL ORDER BY "integrations"."id" ASC LIMIT 1

 Limit  (cost=0.12..2.35 rows=1 width=4) (actual time=0.009..0.010 rows=0 loops=1)
   Buffers: shared hit=1
   I/O Timings: read=0.000 write=0.000
   ->  Index Only Scan using tmp_idx_integrations_on_id_instance_project_id_organization_id on public.integrations  (cost=0.12..2.35 rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=1)
         Heap Fetches: 0
         Buffers: shared hit=1
         I/O Timings: read=0.000 write=0.000
Settings: seq_page_cost = '4', work_mem = '100MB', effective_cache_size = '472585MB', jit = 'off', random_page_cost = '1.5'

Batch query

explain SELECT "integrations"."id" FROM "integrations" WHERE "integrations"."instance" = FALSE AND "integrations"."project_id" IS NOT NULL AND "integrations"."organization_id" IS NOT NULL AND "integrations"."id" >= 30 ORDER BY "integrations"."id" ASC LIMIT 1 OFFSET 50

 Limit  (cost=2.35..4.58 rows=1 width=4) (actual time=0.012..0.013 rows=0 loops=1)
   Buffers: shared hit=2
   I/O Timings: read=0.000 write=0.000
   ->  Index Only Scan using tmp_idx_integrations_on_id_instance_project_id_organization_id on public.integrations  (cost=0.12..2.35 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=1)
         Index Cond: (integrations.id >= 30)
         Heap Fetches: 0
         Buffers: shared hit=2
         I/O Timings: read=0.000 write=0.000
Settings: random_page_cost = '1.5', seq_page_cost = '4', work_mem = '100MB', effective_cache_size = '472585MB', jit = 'off'

Update query

explain UPDATE "integrations" SET "organization_id" = NULL WHERE "integrations"."instance" = FALSE AND "integrations"."project_id" IS NOT NULL AND "integrations"."organization_id" IS NOT NULL AND "integrations"."id" >= 30

 ModifyTable on public.integrations  (cost=0.12..2.35 rows=0 width=0) (actual time=0.016..0.016 rows=0 loops=1)
   Buffers: shared hit=2
   I/O Timings: read=0.000 write=0.000
   ->  Index Scan using tmp_idx_integrations_on_id_instance_project_id_organization_id on public.integrations  (cost=0.12..2.35 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)
         Index Cond: (integrations.id >= 30)
         Filter: ((NOT integrations.instance) AND (integrations.project_id IS NOT NULL) AND (integrations.organization_id IS NOT NULL))
         Rows Removed by Filter: 0
         Buffers: shared hit=2
         I/O Timings: read=0.000 write=0.000
Settings: random_page_cost = '1.5', seq_page_cost = '4', work_mem = '100MB', effective_cache_size = '472585MB', jit = 'off'

References

#541026 (closed)

Screenshots or screen recordings

integrations-backfill

Before After

How to set up and validate locally

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by George Koltsov

Merge request reports

Loading