What requires downtime?
- Adding Columns
- Dropping Columns
- Renaming Columns
- Changing Column Constraints
- Changing Column Types
- Adding Indexes
- Dropping Indexes
- Adding Tables
- Dropping Tables
- Adding Foreign Keys
- Removing Foreign Keys
- Data Migrations
When working with a database certain operations can be performed without taking GitLab offline, others do require a downtime period. This guide describes various operations, their impact, and how to perform them without requiring downtime.
On PostgreSQL you can safely add a new column to an existing table as long as it does not have a default value. For example, this query would not require downtime:
ALTER TABLE projects ADD COLUMN random_value int;
Add a column with a default however does require downtime. For example, consider this query:
ALTER TABLE projects ADD COLUMN random_value int DEFAULT 42;
This requires updating every single row in the
projects table so that
random_value is set to
42 by default. This requires updating all rows and
indexes in a table. This in turn acquires enough locks on the table for it to
effectively block any other queries.
As of MySQL 5.6 adding a column to a table is still quite an expensive
operation, even when using
LOCK=NONE. This means
downtime may be required when modifying large tables as otherwise the
operation could potentially take hours to complete.
Adding a column with a default value can be done without requiring downtime
when using the migration helper method
Gitlab::Database::MigrationHelpers#add_column_with_default. This method works
add_column except it updates existing rows in batches without
blocking access to the table being modified. See "Adding Columns With Default
Values" for more
information on how to use this method.
Removing columns is tricky because running GitLab processes may still be using the columns. To work around this you will need two separate merge requests and releases: one to ignore and then remove the column, and one to remove the ignore rule.
The first step is to ignore the column in the application code. This is
necessary because Rails caches the columns and re-uses this cache in various
places. This can be done by including the
IgnorableColumn module into the
model, followed by defining the columns to ignore. For example, to ignore
updated_at in the User model you'd use the following:
class User < ActiveRecord::Base include IgnorableColumn ignore_column :updated_at end
Once added you should create a post-deployment migration that removes the column. Both these changes should be submitted in the same merge request.
Once the changes from step 1 have been released & deployed you can set up a
separate merge request that removes the ignore rule. This merge request can
simply remove the
ignore_column line, and the
include IgnorableColumn line
if no other
ignore_column calls remain.
Renaming columns the normal way requires downtime as an application may continue using the old column name during/after a database migration. To rename a column without requiring downtime we need two migrations: a regular migration, and a post-deployment migration. Both these migration can go in the same release.
First we need to create the regular migration. This migration should use
Gitlab::Database::MigrationHelpers#rename_column_concurrently to perform the
renaming. For example
# A regular migration in db/migrate class RenameUsersUpdatedAtToUpdatedAtTimestamp < ActiveRecord::Migration include Gitlab::Database::MigrationHelpers disable_ddl_transaction! def up rename_column_concurrently :users, :updated_at, :updated_at_timestamp end def down cleanup_concurrent_column_rename :users, :updated_at_timestamp, :updated_at end end
This will take care of renaming the column, ensuring data stays in sync, copying over indexes and foreign keys, etc.
NOTE: if a column contains 1 or more indexes that do not contain the name of the original column, the above procedure will fail. In this case you will first need to rename these indexes.
The renaming procedure requires some cleaning up in a post-deployment migration.
We can perform this cleanup using
# A post-deployment migration in db/post_migrate class CleanupUsersUpdatedAtRename < ActiveRecord::Migration include Gitlab::Database::MigrationHelpers disable_ddl_transaction! def up cleanup_concurrent_column_rename :users, :updated_at, :updated_at_timestamp end def down rename_column_concurrently :users, :updated_at_timestamp, :updated_at end end
Adding or removing a NOT NULL clause (or another constraint) can typically be done without requiring downtime. However, this does require that any application changes are deployed first. Thus, changing the constraints of a column should happen in a post-deployment migration.
Changing the type of a column can be done using
method works similarly to
rename_column_concurrently. For example, let's say
we want to change the type of
A regular migration is used to create a new column with a temporary name along with setting up some triggers to keep data in sync. Such a migration would look as follows:
# A regular migration in db/migrate class ChangeUsersUsernameStringToText < ActiveRecord::Migration include Gitlab::Database::MigrationHelpers disable_ddl_transaction! def up change_column_type_concurrently :users, :username, :text end def down cleanup_concurrent_column_type_change :users, :username end end
Next we need to clean up our changes using a post-deployment migration:
# A post-deployment migration in db/post_migrate class ChangeUsersUsernameStringToTextCleanup < ActiveRecord::Migration include Gitlab::Database::MigrationHelpers disable_ddl_transaction! def up cleanup_concurrent_column_type_change :users end def down change_column_type_concurrently :users, :username, :string end end
And that's it, we're done!
Adding indexes is an expensive process that blocks INSERT and UPDATE queries for
the duration. When using PostgreSQL one can work arounds this by using the
CREATE INDEX CONCURRENTLY index_name ON projects (column_name);
Migrations can take advantage of this by using the method
add_concurrent_index. For example:
class MyMigration < ActiveRecord::Migration def up add_concurrent_index :projects, :column_name end def down remove_index(:projects, :column_name) if index_exists?(:projects, :column_name) end end
add_concurrent_index can not be reversed automatically, thus you
need to manually define
When running this on PostgreSQL the
CONCURRENTLY option mentioned above is
used. On MySQL this method produces a regular
CREATE INDEX query.
MySQL doesn't really have a workaround for this. Supposedly it can create indexes without the need for downtime but only for variable width columns. The details on this are a bit sketchy. Since it's better to be safe than sorry one should assume that adding indexes requires downtime on MySQL.
Dropping an index does not require downtime on both PostgreSQL and MySQL.
This operation is safe as there's no code using the table just yet.
Dropping tables can be done safely using a post-deployment migration, but only if the application no longer uses the table.
Adding foreign keys usually works in 3 steps:
- Start a transaction
ALTER TABLEto add the constraint(s)
- Check all existing data
ALTER TABLE typically acquires an exclusive lock until the end of a
transaction this means this approach would require downtime.
GitLab allows you to work around this by using
Gitlab::Database::MigrationHelpers#add_concurrent_foreign_key. This method
ensures that when PostgreSQL is used no downtime is needed.
This operation does not require downtime.
Data migrations can be tricky. The usual approach to migrate data is to take a 3 step approach:
- Migrate the initial batch of data
- Deploy the application code
- Migrate any remaining data
Usually this works, but not always. For example, if a field's format is to be changed from JSON to something else we have a bit of a problem. If we were to change existing data before deploying application code we'll most likely run into errors. On the other hand, if we were to migrate after deploying the application code we could run into the same problems.
If you merely need to correct some invalid data, then a post-deployment migration is usually enough. If you need to change the format of data (e.g. from JSON to something else) it's typically best to add a new column for the new data format, and have the application use that. In such a case the procedure would be:
- Add a new column in the new format
- Copy over existing data to this new column
- Deploy the application code
- In a post-deployment migration, copy over any remaining data
In general there is no one-size-fits-all solution, therefore it's best to discuss these kind of migrations in a merge request to make sure they are implemented in the best way possible.