Sharding guidelines

The sharding initiative is a long-running project to ensure that most GitLab database tables can be related to an Organization, either directly or indirectly. This involves adding an organization_id, namespace_id or project_id column to tables, and backfilling their NOT NULL fallback data. This work is important for the delivery of Cells and Organizations. For more information, see the design goals of Organizations.

Sharding principles

Follow this guidance to complete the remaining sharding key work and resolve outstanding issues.

Use unique issues for each table

We have a number of tables which share an issue. For example, eight tables point to the same issue here. This makes tracking progress and resolving blockers difficult. You should break out these shared issues into a single one per table, and update the YAML files to match.

Update unresolved, closed issues

Some of the issues linked in the database YAML docs have been closed, sometimes in favor of new issues, but the YAML files still point to the original URL. You should update these to point to the correct items to ensure we’re accurately measuring progress.

Add more information to sharding issues

Every sharding issue should have an assignee, an associated milestone, and should link to blockers, if applicable. This helps us plan the work and estimate completion dates. It also ensures each issue names someone to contact in the case of problems or concerns. It also helps us to visualize the project work by highlighting blocker issues so we can help resolve them.

Note that a blocker can be a dependency. For example, the notes table needs to be fully migrated before other tables can proceed. Any downstream issues should mark the related item as a blocker to help us understand these relationships.

Tables marked exempt_from_sharding should be sharded

This section was moved to another location.