Backfill jira_tracker_data and validate not null constraint
What does this MR do and why?
This MR backfills existing jira_tracker_data records that have both group/project_id and organization_id set. This is a similar problem to one we had with integrations table Backfill integrations and validate not null con... (!204744 - merged) so the solution is similar:
- Create temp indexes to support backfill of records that have project/group_id & organization id set
- Add not valid not null constraint for the duration of the backfill (to prevent new invalid records from being created for the duration of the backfill)
- Backfill the records
- Validate multi column not null constraint
- Remove temp indexes
We have 2 problematic records on .com and ~200 rows to backfill on staging.
Query plans for backfills
Group records
Initial query
EXPLAIN SELECT "jira_tracker_data"."id" FROM "jira_tracker_data" WHERE "jira_tracker_data"."group_id" IS NOT NULL AND "jira_tracker_data"."organization_id" IS NOT NULL ORDER BY "jira_tracker_data"."id" ASC LIMIT 1;
Limit (cost=0.12..2.89 rows=1 width=8) (actual time=0.008..0.009 rows=0 loops=1)
Buffers: shared hit=1
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using tmp_idx_jira_tracker_data_on_id_group_id_organization_id on public.jira_tracker_data (cost=0.12..2.89 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=1)
Heap Fetches: 0
Buffers: shared hit=1
I/O Timings: read=0.000 write=0.000
Settings: effective_cache_size = '472585MB', jit = 'off', work_mem = '100MB', random_page_cost = '1.5', seq_page_cost = '4'
Batch query
EXPLAIN SELECT "jira_tracker_data"."id" FROM "jira_tracker_data" WHERE "jira_tracker_data"."group_id" IS NOT NULL AND "jira_tracker_data"."organization_id" IS NOT NULL AND "jira_tracker_data"."id" >= 1 ORDER BY "jira_tracker_data"."id" ASC LIMIT 1 OFFSET 1000;
Limit (cost=2.89..5.65 rows=1 width=8) (actual time=0.040..0.041 rows=0 loops=1)
Buffers: shared hit=5
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using tmp_idx_jira_tracker_data_on_id_group_id_organization_id on public.jira_tracker_data (cost=0.12..2.89 rows=1 width=8) (actual time=0.039..0.039 rows=0 loops=1)
Index Cond: (jira_tracker_data.id >= 1)
Heap Fetches: 0
Buffers: shared hit=5
I/O Timings: read=0.000 write=0.000
Settings: effective_cache_size = '472585MB', jit = 'off', work_mem = '100MB', random_page_cost = '1.5', seq_page_cost = '4'
Update query
EXPLAIN UPDATE "jira_tracker_data" SET "organization_id" = NULL WHERE "jira_tracker_data"."group_id" IS NOT NULL AND "jira_tracker_data"."organization_id" IS NOT NULL AND "jira_tracker_data"."id" >= 1;
ModifyTable on public.jira_tracker_data (cost=0.12..2.89 rows=0 width=0) (actual time=0.024..0.025 rows=0 loops=1)
Buffers: shared hit=5
I/O Timings: read=0.000 write=0.000
-> Index Scan using tmp_idx_jira_tracker_data_on_id_group_id_organization_id on public.jira_tracker_data (cost=0.12..2.89 rows=1 width=14) (actual time=0.024..0.024 rows=0 loops=1)
Index Cond: (jira_tracker_data.id >= 1)
Filter: ((jira_tracker_data.group_id IS NOT NULL) AND (jira_tracker_data.organization_id IS NOT NULL))
Rows Removed by Filter: 0
Buffers: shared hit=5
I/O Timings: read=0.000 write=0.000
Settings: effective_cache_size = '472585MB', jit = 'off', work_mem = '100MB', random_page_cost = '1.5', seq_page_cost = '4'
Project records
Initial query
EXPLAIN UPDATE "jira_tracker_data" SET "organization_id" = NULL WHERE "jira_tracker_data"."project_id" IS NOT NULL AND "jira_tracker_data"."organization_id" IS NOT NULL AND "jira_tracker_data"."id" >= 6;
ModifyTable on public.jira_tracker_data (cost=0.12..2.89 rows=0 width=0) (actual time=0.029..0.030 rows=0 loops=1)
Buffers: shared hit=5
I/O Timings: read=0.000 write=0.000
-> Index Scan using tmp_idx_jira_tracker_data_on_id_project_id_organization_id on public.jira_tracker_data (cost=0.12..2.89 rows=1 width=14) (actual time=0.027..0.028 rows=0 loops=1)
Index Cond: (jira_tracker_data.id >= 6)
Filter: ((jira_tracker_data.project_id IS NOT NULL) AND (jira_tracker_data.organization_id IS NOT NULL))
Rows Removed by Filter: 0
Buffers: shared hit=5
I/O Timings: read=0.000 write=0.000
Settings: effective_cache_size = '472585MB', jit = 'off', work_mem = '100MB', random_page_cost = '1.5', seq_page_cost = '4'
Batch query
EXPLAIN SELECT "jira_tracker_data"."id" FROM "jira_tracker_data" WHERE "jira_tracker_data"."project_id" IS NOT NULL AND "jira_tracker_data"."organization_id" IS NOT NULL AND "jira_tracker_data"."id" >= 6 ORDER BY "jira_tracker_data"."id" ASC LIMIT 1 OFFSET 1000;
Limit (cost=2.89..5.65 rows=1 width=8) (actual time=0.019..0.020 rows=0 loops=1)
Buffers: shared hit=5
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using tmp_idx_jira_tracker_data_on_id_project_id_organization_id on public.jira_tracker_data (cost=0.12..2.89 rows=1 width=8) (actual time=0.018..0.018 rows=0 loops=1)
Index Cond: (jira_tracker_data.id >= 6)
Heap Fetches: 0
Buffers: shared hit=5
I/O Timings: read=0.000 write=0.000
Settings: jit = 'off', work_mem = '100MB', random_page_cost = '1.5', seq_page_cost = '4', effective_cache_size = '472585MB'
Update query
EXPLAIN SELECT "jira_tracker_data"."id" FROM "jira_tracker_data" WHERE "jira_tracker_data"."project_id" IS NOT NULL AND "jira_tracker_data"."organization_id" IS NOT NULL ORDER BY "jira_tracker_data"."id" ASC LIMIT 1;
Limit (cost=0.12..2.89 rows=1 width=8) (actual time=0.005..0.006 rows=0 loops=1)
Buffers: shared hit=1
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using tmp_idx_jira_tracker_data_on_id_project_id_organization_id on public.jira_tracker_data (cost=0.12..2.89 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=1)
Heap Fetches: 0
Buffers: shared hit=1
I/O Timings: read=0.000 write=0.000
Settings: work_mem = '100MB', random_page_cost = '1.5', seq_page_cost = '4', effective_cache_size = '472585MB', jit = 'off'
References
Screenshots or screen recordings
How to set up and validate locally
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Edited by George Koltsov