Cells: Classify: Make uploads table to be attributable to be an org
Problem
The uploads holds a record of all uploaded files into GitLab. This table is attached to many models (users, projects, groups, etc.).
This table is not clearly attributable to be either clusterwide or cell-local.
There was some investigation into the problem in [Feature] Cells 1.0 impact for file uploads (#443573 - closed)
Geo
The same applies to upload_states that is used by Geo to track uploaded records that needs verification.
Dependencies
We need the tables backing the models using uploads to have their sharding keys so that we can use them.
-
abuse_reports -
achievements -
ai_vectorizable_files -
alert_management_alert_metric_images -
appearances -
bulk_import_export_uploads -
dependency_list_export_parts -
dependency_list_exports -
design_management_designs_versions -
import_export_uploads -
issuable_metric_images -
namespaces -
organization_details -
project_relation_export_uploads -
topics -
projects -
snippets -
user_permission_export_uploads -
users -
vulnerability_archive_exports -
vulnerability_export_parts -
vulnerability_exports -
vulnerability_remediations
https://docs.google.com/spreadsheets/d/19CcPaUGxOaT1rwjSdRvLkhu_-91RUBOdjDFGVxOonVs/edit?usp=sharing
| Dependency | Partition | Sharding Key |
|---|---|---|
| abuse_reports |
|
|
| achievements |
|
|
| ai_vectorizable_files |
|
|
| alert_management_alert_metric_images |
|
|
| appearances |
|
schema: |
| bulk_import_export_uploads |
|
|
| dependency_list_export_parts |
|
|
| dependency_list_exports |
|
|
| design_management_designs_versions |
|
|
| import_export_uploads |
|
|
| issuable_metric_images |
|
|
| namespaces |
|
|
| organization_details |
|
|
| project_relation_export_uploads |
|
|
| topics |
|
|
| projects |
|
|
| snippets |
|
|
| user_permission_export_uploads |
|
|
| users |
|
|
| vulnerability_archive_exports |
|
|
| vulnerability_export_parts |
|
|
| vulnerability_exports |
|
|
| vulnerability_remediations |
|
|
Solution
We should introduce new table to be either cluster or cell-local and split this table into two with a clear purpose.
Proposal
Based on the discussion here - #398199 (comment 2101029924).
-
Milestone 17.7: -
Add new sharding key columns to uploads (!168003 (merged)) -
Update the app to populate sharding key columns for new uploads when available (!168003 (merged))
-
-
Milestone 17.11: -
Create new uploads_9ba88c4165table (likeuploads) partitioned bymodel_type, mark it asexempt_from_sharding: true(!175203 (merged)) -
Create partition for each model_type in the public schema (!175203 (merged)) -
For each partition create FK referencing the sharding key table (!175203 (merged)) -
Start syncing uploads->uploads_9ba88c4165(!175203 (merged))
-
-
Milestone 18.2 (required stop): -
Backfill uploads_9ba88c4165when every related model has its sharding key ready (!181349 (merged))
-
-
Milestone 18.3: -
Finalize back-fill migration !198033 (merged)
-
-
Milestone 18.5: -
Clean up note_uploads(no longer needed after !185893 (merged)) (!206764 (merged))
-
-
Milestone 18.6 (work on all dependencies is completed) - Add database triggers for all partitions to set sharding key if missing (!208858)
- Truncate partitions (to remove orphaned uploads)
- Re-run back-fill (updated to set new sharding keys)
-
Milestone M (after a required stop) -
Finalize back-fill -
For each partition create NOT NULL constraint !199513 (closed) -
NOT NULLconstraint onappearance_uploads-> !209290 (merged)
-
-
Define sharding key for each partition -
Switch the app to use the new partitioned table by swapping the table names
-