[go: up one dir, main page]

Skip to content

Cells: Classify: Make uploads table to be attributable to be an org

Problem

The uploads holds a record of all uploaded files into GitLab. This table is attached to many models (users, projects, groups, etc.).

This table is not clearly attributable to be either clusterwide or cell-local.

There was some investigation into the problem in [Feature] Cells 1.0 impact for file uploads (#443573 - closed)

Geo

The same applies to upload_states that is used by Geo to track uploaded records that needs verification.

Dependencies

We need the tables backing the models using uploads to have their sharding keys so that we can use them.

  • abuse_reports
  • achievements
  • ai_vectorizable_files
  • alert_management_alert_metric_images
  • appearances
  • bulk_import_export_uploads
  • dependency_list_export_parts
  • dependency_list_exports
  • design_management_designs_versions
  • import_export_uploads
  • issuable_metric_images
  • namespaces
  • organization_details
  • project_relation_export_uploads
  • topics
  • projects
  • snippets
  • user_permission_export_uploads
  • users
  • vulnerability_archive_exports
  • vulnerability_export_parts
  • vulnerability_exports
  • vulnerability_remediations

https://docs.google.com/spreadsheets/d/19CcPaUGxOaT1rwjSdRvLkhu_-91RUBOdjDFGVxOonVs/edit?usp=sharing

Dependency Partition Sharding Key
abuse_reports

abuse_report_uploads

achievements

achievement_uploads

namespace_id

ai_vectorizable_files

ai_vectorizable_file_uploads

project_id

alert_management_alert_metric_images

alert_management_alert_metric_image_uploads

project_id

appearances

appearance_uploads

schema: gitlab_main_cell_setting -> Cell-local, unsharded table

bulk_import_export_uploads

bulk_import_export_upload_uploads

project_id

namespace_id

dependency_list_export_parts

dependency_list_export_part_uploads

organization_id

dependency_list_exports

dependency_list_export_uploads

project_id

namespace_id

organization_id

design_management_designs_versions

design_management_action_uploads

namespace_id

import_export_uploads

import_export_upload_uploads

project_id

namespace_id

issuable_metric_images

issuable_metric_image_uploads

namespace_id

namespaces

namespace_uploads

namespace_id

organization_details

organization_detail_uploads

organization_id

project_relation_export_uploads

project_import_export_relation_export_upload_uploads

project_id

topics

project_topic_uploads

organization_id

projects

project_uploads

project_id

snippets

snippet_uploads

organization_id

user_permission_export_uploads

user_permission_export_upload_uploads

users

user_uploads

model_id (users)

vulnerability_archive_exports

vulnerability_archive_export_uploads

project_id

vulnerability_export_parts

vulnerability_export_part_uploads

organization_id

vulnerability_exports

vulnerability_export_uploads

organization_id

vulnerability_remediations

vulnerability_remediation_uploads

project_id

Solution

We should introduce new table to be either cluster or cell-local and split this table into two with a clear purpose.

Proposal

Based on the discussion here - #398199 (comment 2101029924).

  • Milestone 17.7:
  • Milestone 17.11:
    • Create new uploads_9ba88c4165 table (like uploads) partitioned by model_type, mark it as exempt_from_sharding: true (!175203 (merged))
    • Create partition for each model_type in the public schema (!175203 (merged))
    • For each partition create FK referencing the sharding key table (!175203 (merged))
    • Start syncing uploads -> uploads_9ba88c4165 (!175203 (merged))
  • Milestone 18.2 (required stop):
    • Backfill uploads_9ba88c4165 when every related model has its sharding key ready (!181349 (merged))
  • Milestone 18.3:
  • Milestone 18.5:
  • Milestone 18.6 (work on all dependencies is completed)
    • Add database triggers for all partitions to set sharding key if missing (!208858)
    • Truncate partitions (to remove orphaned uploads)
    • Re-run back-fill (updated to set new sharding keys)
  • Milestone M (after a required stop)
    • Finalize back-fill
    • For each partition create NOT NULL constraint !199513 (closed)
    • Define sharding key for each partition
    • Switch the app to use the new partitioned table by swapping the table names
Edited by Tomasz Skorupa