Migration Style Guide

When writing migrations for GitLab, you have to take into account that these are run by hundreds of thousands of organizations of all sizes, some with many years of data in their database.

In addition, having to take a server offline for an upgrade small or big is a big burden for most organizations. For this reason, it is important that your migrations are written carefully, can be applied online, and adhere to the style guide below.

Migrations are not allowed to require GitLab installations to be taken offline ever. Migrations always must be written in such a way to avoid downtime. In the past we had a process for defining migrations that allowed for downtime by setting a DOWNTIME constant. You may see this when looking at older migrations. This process was in place for 4 years without ever being used and as such we’ve learned we can always figure out how to write a migration differently to avoid downtime.

When writing your migrations, also consider that databases might have stale data or inconsistencies and guard for that. Try to make as few assumptions as possible about the state of the database.

Please don’t depend on GitLab-specific code since it can change in future versions. If needed copy-paste GitLab code into the migration to make it forward compatible.

Choose an appropriate migration type

The first step before adding a new migration should be to decide which type is most appropriate.

There are currently three kinds of migrations you can create, depending on the kind of work it needs to perform and how long it takes to complete:

  1. Regular schema migrations. These are traditional Rails migrations in db/migrate that run before new application code is deployed (for GitLab.com before Canary is deployed). This means that they should be relatively fast, no more than a few minutes, so as not to unnecessarily delay a deployment.

    One exception is a migration that takes longer but is absolutely critical for the application to operate correctly. For example, you might have indices that enforce unique tuples, or that are needed for query performance in critical parts of the application. In cases where the migration would be unacceptably slow, however, a better option might be to guard the feature with a feature flag and perform a post-deployment migration instead. The feature can then be turned on after the migration finishes.

  2. Post-deployment migrations. These are Rails migrations in db/post_migrate and run after new application code has been deployed (for GitLab.com after the production deployment has finished). They can be used for schema changes that aren’t critical for the application to operate, or data migrations that take at most a few minutes. Common examples for schema changes that should run post-deploy include:
    • Clean-ups, like removing unused columns.
    • Adding non-critical indices on high-traffic tables.
    • Adding non-critical indices that take a long time to create.
  3. Batched background migrations. These aren’t regular Rails migrations, but application code that is executed via Sidekiq jobs, although a post-deployment migration is used to schedule them. Use them only for data migrations that exceed the timing guidelines for post-deploy migrations. Batched background migrations should not change the schema.

Use the following diagram to guide your decision, but keep in mind that it is just a tool, and the final outcome will always be dependent on the specific changes being made:

graph LR A{Schema<br/>changed?} A -->|Yes| C{Critical to<br/>speed or<br/>behavior?} A -->|No| D{Is it fast?} C -->|Yes| H{Is it fast?} C -->|No| F[Post-deploy migration] H -->|Yes| E[Regular migration] H -->|No| I[Post-deploy migration<br/>+ feature flag] D -->|Yes| F[Post-deploy migration] D -->|No| G[Background migration]

How long a migration should take

In general, all migrations for a single deploy shouldn’t take longer than 1 hour for GitLab.com. The following guidelines are not hard rules, they were estimated to keep migration duration to a minimum.

note
Keep in mind that all durations should be measured against GitLab.com.
Migration Type Recommended Duration Notes
Regular migrations <= 3 minutes A valid exception are changes without which application functionality or performance would be severely degraded and which cannot be delayed.
Post-deployment migrations <= 10 minutes A valid exception are schema changes, since they must not happen in background migrations.
Background migrations > 10 minutes Since these are suitable for larger tables, it’s not possible to set a precise timing guideline, however, any single query must stay below 1 second execution time with cold caches.

Decide which database to target

GitLab connects to two different Postgres databases: main and ci. This split can affect migrations as they may run on either or both of these databases.

Read Migrations for Multiple databases to understand if or how a migration you add should account for this.

Create a regular schema migration

To create a migration you can use the following Rails generator:

bundle exec rails g migration migration_name_here

This generates the migration file in db/migrate.

Schema Changes

Changes to the schema should be committed to db/structure.sql. This file is automatically generated by Rails when you run bundle exec rails db:migrate, so you normally should not edit this file by hand. If your migration is adding a column to a table, that column is added at the bottom. Please do not reorder columns manually for existing tables as this causes confusion to other people using db/structure.sql generated by Rails.

note
Creating an index asynchronously requires two merge requests. When done, commit the schema change in the merge request that adds the index with add_concurrent_index.

When your local database in your GDK is diverging from the schema from main it might be hard to cleanly commit the schema changes to Git. In that case you can use the scripts/regenerate-schema script to regenerate a clean db/structure.sql for the migrations you’re adding. This script applies all migrations found in db/migrate or db/post_migrate, so if there are any migrations you don’t want to commit to the schema, rename or remove them. If your branch is not targeting main you can set the TARGET environment variable.

# Regenerate schema against `main`
scripts/regenerate-schema

# Regenerate schema against `12-9-stable-ee`
TARGET=12-9-stable-ee scripts/regenerate-schema

The scripts/regenerate-schema script can create additional differences. If this happens, use a manual procedure where <migration ID> is the DATETIME part of the migration file.

# Rebase against master
git rebase master

# Rollback changes
VERSION=<migration ID> bundle exec rails db:rollback:main

# Checkout db/structure.sql from master
git checkout origin/master db/structure.sql

# Migrate changes
VERSION=<migration ID> bundle exec rails db:migrate:main

Avoiding downtime

The document “Avoiding downtime in migrations” specifies various database operations, such as:

and explains how to perform them without requiring downtime.

Reversibility

Your migration must be reversible. This is very important, as it should be possible to downgrade in case of a vulnerability or bugs.

In your migration, add a comment describing how the reversibility of the migration was tested.

Some migrations cannot be reversed. For example, some data migrations can’t be reversed because we lose information about the state of the database before the migration. You should still create a down method with a comment, explaining why the changes performed by the up method can’t be reversed, so that the migration itself can be reversed, even if the changes performed during the migration can’t be reversed:

def down
  # no-op

  # comment explaining why changes performed by `up` cannot be reversed.
end

Migrations like this are inherently risky and additional actions are required when preparing the migration for review.

Atomicity

By default, migrations are single transaction. That is, a transaction is opened at the beginning of the migration, and committed after all steps are processed.

Running migrations in a single transaction makes sure that if one of the steps fails, none of the steps are executed, leaving the database in valid state. Therefore, either:

  • Put all migrations in one single-transaction migration.
  • If necessary, put most actions in one migration and create a separate migration for the steps that cannot be done in a single transaction.

For example, if you create an empty table and need to build an index for it, it is recommended to use a regular single-transaction migration and the default rails schema statement: add_index. This is a blocking operation, but it doesn’t cause problems because the table is not yet used, and therefore it does not have any records yet.

Naming conventions

We keep column names consistent with ActiveRecord’s schema conventions.

Custom index names should follow the pattern index_#{table_name}_on_#{column_1}_and_#{column_2}_#{condition}.

Examples:

  • index_services_on_type_and_id_and_template_when_active
  • index_projects_on_id_service_desk_enabled
  • index_clusters_on_enabled_cluster_type_id_and_created_at

Truncate long index names

PostgreSQL limits the length of identifiers, like column or index names. Column names are not usually a problem, but index names tend to be longer. Some methods for shortening a name that’s too long:

  • Prefix it with i_ instead of index_.
  • Skip redundant prefixes. For example, index_vulnerability_findings_remediations_on_vulnerability_remediation_id becomes index_vulnerability_findings_remediations_on_remediation_id.
  • Instead of columns, specify the purpose of the index, such as index_users_for_unconfirmation_notification.

Heavy operations in a single transaction

When using a single-transaction migration, a transaction holds a database connection for the duration of the migration, so you must make sure the actions in the migration do not take too much time: GitLab.com’s production database has a 15s timeout, so in general, the cumulative execution time in a migration should aim to fit comfortably in that limit. Singular query timings should fit within the standard limit

In case you need to insert, update, or delete a significant amount of data, you:

  • Must disable the single transaction with disable_ddl_transaction!.
  • Should consider doing it in a Background Migration.

Migration helpers and versioning

Introduced in GitLab 14.3.

Various helper methods are available for many common patterns in database migrations. Those helpers can be found in Gitlab::Database::MigrationHelpers and related modules.

In order to allow changing a helper’s behavior over time, we implement a versioning scheme for migration helpers. This allows us to maintain the behavior of a helper for already existing migrations but change the behavior for any new migrations.

For that purpose, all database migrations should inherit from Gitlab::Database::Migration, which is a “versioned” class. For new migrations, the latest version should be used (which can be looked up in Gitlab::Database::Migration::MIGRATION_CLASSES) to use the latest version of migration helpers.

In this example, we use version 1.0 of the migration class:

class TestMigration < Gitlab::Database::Migration[2.0]
  def change
  end
end

Do not include Gitlab::Database::MigrationHelpers directly into a migration. Instead, use the latest version of Gitlab::Database::Migration, which exposes the latest version of migration helpers automatically.

Migration helpers and versioning were introduced in GitLab 14.3. For merge requests targeting previous stable branches, use the old format and still inherit from ActiveRecord::Migration[6.1] instead of Gitlab::Database::Migration[2.0].

Retry mechanism when acquiring database locks

When changing the database schema, we use helper methods to invoke DDL (Data Definition Language) statements. In some cases, these DDL statements require a specific database lock.

Example:

def change
  remove_column :users, :full_name, :string
end

Executing this migration requires an exclusive lock on the users table. When the table is concurrently accessed and modified by other processes, acquiring the lock may take a while. The lock request is waiting in a queue and it may also block other queries on the users table once it has been enqueued.

<