Multiple Databases
To allow GitLab to scale further we
decomposed the GitLab application database into multiple databases.
The main databases are main
, ci
, and (optionally) sec
. GitLab supports being run with one, two, or three databases.
On GitLab.com we are using separate main
and ci
databases.
For the purpose of building the Cells architecture, we are decomposing
the databases further, to introduce another database gitlab_main_clusterwide
.
GitLab Schema
For properly discovering allowed patterns between different databases the GitLab application implements the database dictionary.
The database dictionary provides a virtual classification of tables into a gitlab_schema
which conceptually is similar to PostgreSQL Schema.
We decided as part of using database schemas to better isolated CI decomposed features
that we cannot use PostgreSQL schema due to complex migration procedures. Instead we implemented
the concept of application-level classification.
Each table of GitLab needs to have a gitlab_schema
assigned:
Database | Description | Notes |
---|---|---|
gitlab_main |
All tables that are being stored in the main: database. |
Currently, this is being replaced with gitlab_main_cell , for the purpose of building the Cells architecture. gitlab_main_cell schema describes all tables that are local to a cell in a GitLab installation. For example, projects and groups |
gitlab_main_clusterwide |
All tables where all rows, or a subset of rows needs to be present across the cluster, in the Cells architecture. For example, users and application_settings . |
For the Cells 1.0 architecture, there are no real clusterwide tables as each cell will have its own database. In effect, these tables will still be stored locally in each cell. |
gitlab_ci |
All CI tables that are being stored in the ci: database (for example, ci_pipelines , ci_builds ) |
|
gitlab_geo |
All Geo tables that are being stored in the geo: database (for example, like project_registry , secondary_usage_data ) |
|
gitlab_shared |
All application tables that contain data across all decomposed databases (for example, loose_foreign_keys_deleted_records ) for models that inherit from Gitlab::Database::SharedModel . |
|
gitlab_internal |
All internal tables of Rails and PostgreSQL (for example, ar_internal_metadata , schema_migrations , pg_* ) |
|
gitlab_pm |
All tables that store package_metadata |
It is an alias for gitlab_main , to be replaced with gitlab_sec |
gitlab_sec |
All Security and Vulnerability feature tables to be stored in the sec: database |
Decomposition in progress |
More schemas to be introduced with additional decomposed databases
The usage of schema enforces the base class to be used:
ApplicationRecord
forgitlab_main
/gitlab_main_cell.
Ci::ApplicationRecord
forgitlab_ci
Geo::TrackingBase
forgitlab_geo
Gitlab::Database::SharedModel
forgitlab_shared
PackageMetadata::ApplicationRecord
forgitlab_pm
Gitlab::Database::SecApplicationRecord
forgitlab_sec
Choose either the gitlab_main_cell
or gitlab_main_clusterwide
schema
This content has been moved to a new location
Defining a sharding key for all cell-local tables
This content has been moved to a new location
The impact of gitlab_schema
The usage of gitlab_schema
has a significant impact on the application.
The gitlab_schema
primary purpose is to introduce a barrier between different data access patterns.
This is used as a primary source of classification for:
- Discovering cross-joins across tables from different schemas
- Discovering cross-database transactions across tables from different schemas
The special purpose of gitlab_shared
gitlab_shared
is a special case that describes tables or views that, by design, contain data across
all decomposed databases. This classification describes application-defined tables (like loose_foreign_keys_deleted_records
).
Be careful to use gitlab_shared
as it requires special handling while accessing data.
Since gitlab_shared
shares not only structure but also data, the application needs to be written in a way
that traverses all data from all databases in sequential manner.
Gitlab::Database::EachDatabase.each_model_connection([MySharedModel]) do |connection, connection_name|
MySharedModel.select_all_data...
end
As such, migrations modifying data of gitlab_shared
tables are expected to run across
all decomposed databases.
The special purpose of gitlab_internal
gitlab_internal
describes Rails-defined tables (like schema_migrations
or ar_internal_metadata
), as well as internal PostgreSQL tables (for example, pg_attribute
). Its primary purpose is to support other databases, like Geo, that
might be missing some of those application-defined gitlab_shared
tables (like loose_foreign_keys_deleted_records
), but are valid Rails databases.
The special purpose of gitlab_pm
gitlab_pm
stores package metadata describing public repositories. This data is used for the License Compliance and Dependency Scanning product categories and is maintained by the Composition Analysis Group. It is an alias for gitlab_main
intended to make it easier to route to a different database in the future.
Migrations
Read Migrations for Multiple Databases.
CI/CD Database
Configure single database
By default, GDK is configured to run with multiple databases.
Switching back-and-forth between single and multiple databases in
the same development instance is discouraged. Any data in the ci
database will not be accessible in single database mode. For single database, you should use a separate development instance.
To configure GDK to use a single database:
-
On the GDK root directory, run:
gdk config set gitlab.rails.databases.ci.enabled false
-
Reconfigure GDK:
gdk reconfigure
To switch back to using multiple databases, set gitlab.rails.databases.ci.enabled
to true
and run gdk reconfigure
.