- General process
- Roles and process
- Distributing review workload
- How to prepare the merge request for a database review
- How to review for database
- Timing guidelines for migrations
This page is specific to database reviews. Please refer to our code review guide for broader advice and best practices for code review in general.
A database review is required for:
- Changes that touch the database schema or perform data migrations,
including files in:
- Changes to the database tooling, e.g.:
- migration or ActiveRecord helpers in
- load balancing
- migration or ActiveRecord helpers in
- Changes that produce SQL queries that are beyond the obvious. It is generally up to the author of a merge request to decide whether or not complex queries are being introduced and if they require a database review.
A database reviewer is expected to look out for obviously complex queries in the change and review those closer. If the author does not point out specific queries for review and there are no obviously complex queries, it is enough to concentrate on reviewing the migration only.
It is preferable to review queries in SQL form and generally accepted to ask the author to translate any ActiveRecord queries in SQL form for review.
A Merge Request author’s role is to:
- Decide whether a database review is needed.
- If database review is needed, add the ~database label.
- Prepare the merge request for a database review.
A database reviewer’s role is to:
- Perform a first-pass review on the MR and suggest improvements to the author.
- Once satisfied, relabel the MR with ~”database::reviewed”, approve it, and reassign MR to the database maintainer suggested by Reviewer Roulette.
Currently we have a critical shortage of database maintainers. Until we are able to increase the number of database maintainers to support the volume of reviews, we have implemented this temporary solution. If the database reviewer cannot find an available database maintainer then:
- Assign the MR for a second review by a database trainee maintainer for further review.
- Once satisfied with the review process, and if the database maintainer is still not available, skip the database maintainer approval step and assign the merge request to a backend maintainer for final review and approval.
A database maintainer’s role is to:
- Perform the final database review on the MR.
- Discuss further improvements or other relevant changes with the database reviewer and the MR author.
- Finally approve the MR and relabel the MR with ~”database::approved”
- Merge the MR if no other approvals are pending or pass it on to other maintainers as required (frontend, backend, docs).
Review workload is distributed using reviewer roulette (example). The MR author should then co-assign the suggested database reviewer. When they give their sign-off, they will hand over to the suggested database maintainer.
If reviewer roulette didn’t suggest a database reviewer & maintainer,
make sure you have applied the ~database label and rerun the
danger-review CI job, or pick someone from the
In order to make reviewing easier and therefore faster, please take the following preparations into account.
- Make migrations reversible by using the
changemethod or include a
downmethod when using
- Include either a rollback procedure or describe how to rollback changes.
- Add the output of the migration(s) to the MR description.
- Add tests for the migration in
spec/migrationsif necessary. See Testing Rails migrations at GitLab for more details.
- Write the raw SQL in the MR description. Preferably formatted nicely with sqlformat.darold.net or paste.depesz.com.
- Include the output of
EXPLAIN (ANALYZE, BUFFERS)of the relevant queries in the description. If the output is too long, wrap it in
<details>blocks, paste it in a GitLab Snippet, or provide the link to the plan at: explain.depesz.com.
- When providing query plans, make sure it hits enough data:
- You can use a GitLab production replica to test your queries on a large scale,
#database-labSlack channel or through chatops.
- Usually, the
namespace_id = 9970) and the
project_id = 13083) or the
project_id = 278964) projects provide enough data to serve as a good example.
- You can use a GitLab production replica to test your queries on a large scale, through the
- For query changes, it is best to provide the SQL query along with a plan before and after the change. This helps to spot differences quickly.
- Include data that shows the performance improvement, preferably in the form of a benchmark.
- Include a migration to remove orphaned rows in the source table before adding the foreign key.
- Remove any instances of
dependent: ...that may no longer be necessary.
- Order columns based on the Ordering Table Columns guidelines.
- Add foreign keys to any columns pointing to data in other tables, including an index.
- Add indexes for fields that are used in statements such as
GROUP BY, and
- Follow the guidelines on dropping columns.
- Generally it’s best practice, but not a hard rule, to remove indexes and foreign keys in a post-deployment migration.
- Exceptions include removing indexes and foreign keys for small tables.
- Check migrations
- Review relational modeling and design choices
- Review migrations follow database migration style guide, for example
- Ensure that migrations execute in a transaction or only contain concurrent index/foreign key helpers (with transactions disabled)
- Check consistency with
db/schema.rband that migrations are reversible
- Check queries timing (If any): Queries executed in a migration
need to fit comfortably within
15s- preferably much less than that - on GitLab.com.
- For column removals, make sure the column has been ignored in a previous release
- Check background migrations:
- Establish a time estimate for execution on GitLab.com. For historical purposes, it’s highly recommended to include this estimation on the merge request description.
- If a single
updateis below than
1sthe query can be placed directly in a regular migration (inside
- Background migrations are normally used, but not limited to:
- Migrating data in larger tables.
- Making numerous SQL queries per record in a dataset.
- Review queries (for example, make sure batch sizes are fine)
- Because execution time can be longer than for a regular migration,
it’s suggested to treat background migrations as post migrations:
place them in
db/migrate. Keep in mind that post migrations are executed post-deployment in production.
- Check timing guidelines for migrations
- Check migrations are reversible and implement a
- Check data migrations:
- Establish a time estimate for execution on GitLab.com.
- Depending on timing, data migrations can be placed on regular, post-deploy or background migrations.
- Data migrations should be reversible too or come with a description of how to reverse, when possible. This applies to all types of migrations (regular, post-deploy, background).
- Query performance
- Check for any obviously complex queries and queries the author specifically points out for review (if any)
- If not present yet, ask the author to provide SQL queries and query plans (e.g. by using chatops or direct database access)
- For given queries, review parameters regarding data distribution
- Check query plans and suggest improvements to queries (changing the query, schema or adding indexes and similar)
- General guideline is for queries to come in below 100ms execution time
- If queries rely on prior migrations that are not present yet on production (eg indexes, columns), you can use a one-off instance from the restore pipeline in order to establish a proper testing environment.
- Avoid N+1 problems and minimalize the query count.
In general, migrations for a single deploy shouldn’t take longer than 1 hour for GitLab.com. The following guidelines are not hard rules, they were estimated to keep migration timing to a minimum.
|Migration Type||Execution Time Recommended||Notes|
|Regular migrations on ||A valid exception are index creation as this can take a long time.|
|Post migrations on |
|Background migrations||—||Since these are suitable for larger tables, it’s not possible to set a precise timing guideline, however, any query must stay well below |