Multiple Databases

To allow GitLab to scale further we decomposed the GitLab application database into multiple databases. The main databases are main, ci, and (optionally) sec. GitLab supports being run with one, two, or three databases. On GitLab.com we are using separate main and ci databases.

For the purpose of building the Cells architecture, we are decomposing the databases further, to introduce another database gitlab_main_clusterwide.

GitLab Schema

For properly discovering allowed patterns between different databases the GitLab application implements the database dictionary.

The database dictionary provides a virtual classification of tables into a gitlab_schema which conceptually is similar to PostgreSQL Schema. We decided as part of using database schemas to better isolated CI decomposed features that we cannot use PostgreSQL schema due to complex migration procedures. Instead we implemented the concept of application-level classification. Each table of GitLab needs to have a gitlab_schema assigned:

Database Description Notes
gitlab_main All tables that are being stored in the main: database. Currently, this is being replaced with gitlab_main_cell, for the purpose of building the Cells architecture. gitlab_main_cell schema describes all tables that are local to a cell in a GitLab installation. For example, projects and groups
gitlab_main_clusterwide All tables where all rows, or a subset of rows needs to be present across the cluster, in the Cells architecture. For example, users and application_settings. For the Cells 1.0 architecture, there are no real clusterwide tables as each cell will have its own database. In effect, these tables will still be stored locally in each cell.
gitlab_ci All CI tables that are being stored in the ci: database (for example, ci_pipelines, ci_builds)
gitlab_geo All Geo tables that are being stored in the geo: database (for example, like project_registry, secondary_usage_data)
gitlab_shared All application tables that contain data across all decomposed databases (for example, loose_foreign_keys_deleted_records) for models that inherit from Gitlab::Database::SharedModel.
gitlab_internal All internal tables of Rails and PostgreSQL (for example, ar_internal_metadata, schema_migrations, pg_*)
gitlab_pm All tables that store package_metadata It is an alias for gitlab_main, to be replaced with gitlab_sec
gitlab_sec All Security and Vulnerability feature tables to be stored in the sec: database Decomposition in progress

More schemas to be introduced with additional decomposed databases

The usage of schema enforces the base class to be used:

  • ApplicationRecord for gitlab_main/gitlab_main_cell.
  • Ci::ApplicationRecord for gitlab_ci
  • Geo::TrackingBase for gitlab_geo
  • Gitlab::Database::SharedModel for gitlab_shared
  • PackageMetadata::ApplicationRecord for gitlab_pm
  • Gitlab::Database::SecApplicationRecord for gitlab_sec

Choose either the gitlab_main_cell or gitlab_main_clusterwide schema

This content has been moved to a new location

Defining a sharding key for all cell-local tables

This content has been moved to a new location

The impact of gitlab_schema

The usage of gitlab_schema has a significant impact on the application. The gitlab_schema primary purpose is to introduce a barrier between different data access patterns.

This is used as a primary source of classification for:

The special purpose of gitlab_shared

gitlab_shared is a special case that describes tables or views that, by design, contain data across all decomposed databases. This classification describes application-defined tables (like loose_foreign_keys_deleted_records).

Be careful to use gitlab_shared as it requires special handling while accessing data. Since gitlab_shared shares not only structure but also data, the application needs to be written in a way that traverses all data from all databases in sequential manner.

Gitlab::Database::EachDatabase.each_model_connection([MySharedModel]) do |connection, connection_name|
  MySharedModel.select_all_data...
end

As such, migrations modifying data of gitlab_shared tables are expected to run across all decomposed databases.

The special purpose of gitlab_internal

gitlab_internal describes Rails-defined tables (like schema_migrations or ar_internal_metadata), as well as internal PostgreSQL tables (for example, pg_attribute). Its primary purpose is to support other databases, like Geo, that might be missing some of those application-defined gitlab_shared tables (like loose_foreign_keys_deleted_records), but are valid Rails databases.

The special purpose of gitlab_pm

gitlab_pm stores package metadata describing public repositories. This data is used for the License Compliance and Dependency Scanning product categories and is maintained by the Composition Analysis Group. It is an alias for gitlab_main intended to make it easier to route to a different database in the future.

Migrations

Read Migrations for Multiple Databases.

CI/CD Database

Configure single database

By default, GDK is configured to run with multiple databases.

Switching back-and-forth between single and multiple databases in the same development instance is discouraged. Any data in the ci database will not be accessible in single database mode. For single database, you should use a separate development instance.

To configure GDK to use a single database:

  1. On the GDK root directory, run:

    gdk config set gitlab.rails.databases.ci.enabled false
  2. Reconfigure GDK:

    gdk reconfigure

To switch back to using multiple databases, set gitlab.rails.databases.ci.enabled to true and run gdk reconfigure.