Omnibus GitLab architecture and components

Omnibus GitLab is a customized fork of the Omnibus project from Chef, and it uses Chef components like cookbooks and recipes to perform the task of configuring GitLab on a user’s computer. Omnibus GitLab repository on GitLab.com hosts all the necessary components of Omnibus GitLab. These include parts of Omnibus that are required to build the package, like configurations and project metadata, and the Chef related components that are used in a user’s computer after installation.

Omnibus-GitLab Components

An in-depth video walkthrough of these components is available on YouTube.

Software definitions

GitLab project definition file

A primary component of the omnibus architecture is a project definition file that lists the project details and dependency relations to external software and libraries.

The main components of this project definition file are:

  • Project metadata: Includes attributes such as the project’s name and description.
  • License details of the project.
  • Dependency list: List of external tools and software which are required to build or run GitLab, and sometimes their metadata.
  • Global configuration variables used for installation of GitLab: Includes the installation directory, system user, and system group.

Individual software definitions

Omnibus GitLab follows a batteries-included style of distribution. All of the software, libraries, and binaries necessary for the proper functioning of a GitLab instance is provided as part of the package, in an embedded format.

So another one of the major components of the omnibus architecture is the software definitions and configurations. A typical software configuration consists of the following parts:

  • Version of the software required.
  • License of the software.
  • Dependencies for the software to be built/run.
  • Commands needed to build the software and embed it inside the package.

Sometimes, a software’s source code may have to be patched to use it with GitLab. This may be to fix a security vulnerability, add some functionality needed for GitLab, or make it work with other components of GitLab. For this purpose, Omnibus GitLab consists of a patch directory, where patches for different software are stored.

For more extensive changes, it may be more convenient to track the required changes in a branch on the mirror. The pattern to follow for this is to create a branch from an upstream tag or sha making reference to that branchpoint in the name of the branch. As an example, from the omnibus codebase, gitlab-omnibus-v5.6.10 is based on the v5.6.10 tag of the upstream project. This allows us to generate a comparison link like https://gitlab.com/gitlab-org/omnibus/compare/v5.6.10...gitlab-omnibus-v5.6.10 to identify what local changes are present.

Global GitLab configuration template

Omnibus GitLab ships with it a single configuration file that can be used to configure every part of the GitLab instance, which will be installed on the user’s computer. This configuration file acts as the canonical source of all configuration settings that will be applied to the GitLab instance. It lists the general settings for a GitLab instance as well as various options for different components. The common structure of this file consists of configurations specified in the format <component>['<setting>'] = <value>. All the available options are listed in the configuration template, but all except the ones necessary for the basic working of GitLab are commented out by default. Users may uncomment them and specify corresponding values, if necessary.

GitLab Cookbook

Omnibus GitLab, as previously described, uses many of the Chef components like cookbooks, attributes, and resources. GitLab EE uses a separate cookbook that extends from the one GitLab CE uses and adds the EE-only components. The major players in the Chef-related part of Omnibus GitLab are the following:

Default Attributes

Default attributes, as the name suggests, specifies the default values to different settings provided in the configuration file. These values act as fail-safe and get used if the user doesn’t provide a value to a setting, and thus ensure a working GitLab instance with minimum user tweaking being necessary.

Recipes

Recipes do most of the heavy lifting while installing GitLab using the omnibus package as they are responsible for setting up each component of the GitLab ecosystem in a user’s computer. They create necessary files, directories, and links in their corresponding locations, set their permissions and owners, configure, start, and stop necessary services, and notify these services when files correspond to their change. A master recipe, named default, acts as the entry point and it invokes all other necessary recipes for various components and services.

Custom Resources

Custom Resources can be considered as global-level macros that are available across recipes. Some common uses for Custom Resources are defining the ports used for common services, and listing important directories that may be used by different recipes. They define resources that may be reused by different recipes.

Templates for configuration of components

As mentioned earlier, Omnibus GitLab provides a single configuration file to tweak all components of a GitLab instance. However, the architectural design of different components may require them to have individual configuration files residing at specific locations. These configuration files have to be generated from either the values specified by the user in the general configuration file or from the default values specified. Hence, Omnibus GitLab ships with it templates of such configuration files with placeholders that may be filled by default values or values from the user. The recipes do the job of completing these templates, by filling them and placing them at necessary locations.

General library methods

Omnibus GitLab also ships some library methods that primarily does the purpose of code reuse. This includes methods to check if services are up and running, methods to check if files exist, and helper methods to interact with different components. They’re often used in Chef recipes.

Of all the libraries used in Omnibus GitLab, there are some special ones: the primary GitLab module and all the component-specific libraries that it invokes. The component-specific libraries contain methods that do the job of parsing the configuration file for settings defined for their corresponding components. The primary GitLab module contains methods that coordinate this. It is responsible for identifying default values, invoking component-specific libraries, merging the default values and user-specified values, validating them, and generating additional configurations based on their initial values. Every top-level component that’s shipped by Omnibus GitLab package gets added to this module so that they can be mentioned in the configuration file and default attributes and get parsed correctly.

runit

GitLab uses runit recipes for service management and supervision. runit recipes do the job of identifying the init system used by the OS and performing basic service management tasks like creating necessary service files for GitLab, service enabling, and service reloading. runit provides runit_service definitions that can be used by other recipes to interact with services, see /files/gitlab-cookbooks/runit for more information.

Services

Services are software processes that we run using the runit process init/supervisor. You can check their status, start, stop, and restart them using the gitlab-ctl commands. Recipes may also disable or enable these services based on their process group and the settings/roles that have been configured for the instance of GitLab. The list of services and the service groups associated with them can be found in files/gitlab-cookbooks/package/libraries/config/services.rb.

Additional gitlab-ctl commands

Omnibus, by default, provides some wrapper commands like gitlab-ctl reconfigure and gitlab-ctl restart to manage the GitLab instance. There are some additional wrapper commands that target some specific use cases defined in the Omnibus GitLab repository. These commands get used with the general gitlab-ctl command to perform certain actions like running database migrations or removing dormant accounts and similar not-so-common tasks.

Tests

Omnibus GitLab repository uses ChefSpec to test the cookbooks and recipes it ships. The usual strategy is to check a recipe to see if it behaves correctly in two (or more) conditions: when the user doesn’t specify any corresponding configuration, (i.e. when defaults are used) and when user-specified configuration is used. Tests may include checking if files are generated in the correct locations, services are started/stopped/notified, correct binaries are invoked, and correct parameters are being passed to method invocations. Recipes and library methods have tests associated with them. Omnibus GitLab also uses some support methods or macros to help in the testing process. The tests are defined as compatible for parallelization, where possible, to decrease the time required for running the entire test suite.

So, of the components described above, some (such as software definitions, project metadata, and tests) find use during the package building, in a build environment, and some (such as Chef cookbooks and recipes, GitLab configuration file, runit, and gitlab-ctl commands) are used to configure the user’s installed instance.

Work life cycle of Omnibus GitLab

What happens during package building

The type of packages being built depends on the OS the build process is run. If the build is done on a Debian environment, a .deb package will be created. What happens during package building can be summarized in the following steps

  1. Fetching sources of dependency software:
    1. Parsing software definitions to find out corresponding versions.
    2. Getting source code from remotes or cache.
  2. Building individual software components:
    1. Setting up necessary environment variables and flags.
    2. Applying patches, if applicable.
    3. Performing the build and installation of the component, which involves installing it to an appropriate location (inside /opt/gitlab).
  3. Generating license information of all bundled components - including external software, Ruby gems, and JS modules. This involves analyzing definitions of each dependency as well as any additional licensing document provided by the components (like licenses.csv file provided by GitLab Rails)
  4. Checking the license of the components to make sure we are not shipping a component with a non-compatible license
  5. Running a health check on the package to make sure the binaries are linked against available libraries. For bundled libraries, the binaries should link against them and not the ones available globally.
  6. Building the package with contents of /opt/gitlab. This makes use of the metadata given inside gitlab.rb file. This includes the package name, version, maintainer, homepage, and information regarding conflicts with other packages.

Caching

Omnibus uses two types of cache to optimize the build process: one to store the software artifacts (sources of dependent software), and one to store the project tree after each software component is built

Software artifact cache (for GitLab Inc builds)

Software artifact cache uses an Amazon S3 bucket to store the sources of the dependent software. In our build process, this cache is populated using the command bin/omnibus cache populate. This will pull in all the necessary software sources from the Amazon bucket and store them in the necessary locations. When there is a change in the version requirement of software, omnibus pulls it from the original upstream and adds it to the artifact cache. This process is internal to omnibus and we configure the Amazon bucket to use in omnibus.rb file available in the root of the repository. This cache ensures the availability of the dependent software even if their original upstream remotes go down.

Build cache

A second type of cache that plays an important role in our build process is the build cache. Build cache can be described as snapshots of the project tree (where the project gets built - /opt/gitlab) after each dependent software is built. Consider a project with five dependent pieces of software - A, B, C, D, and E, built in that order, we’re not considering their dependencies. Build cache makes use of Git tags to make snapshots. After each software is built, a Git tag is computed and committed. Now, consider we made some change to the definition of software D. A, B, C and E remains the same. When we try to build again, omnibus can reuse the snapshot that was made before D was built in the previous build. Thus, the time taken to build A, B, and C can be saved as it can simply check out the snapshot that was made after C was built. Omnibus uses the snapshot just before the software which “dirtied” the cache (dirtying can happen either by a change in the software definition, a change in name/version of a previous component, or a change in version of the current component) was built. Similarly, if in a build there is a change in the definition of software A, it will dirty the cache and hence A and all the following dependencies get built from scratch. If C dirties the cache, A and B get reused and C, D, and E get built again from scratch.

This cache makes sense only if it is retained across builds. For that, we use the caching mechanism of GitLab CI. We have a dedicated runner which is configured to store its internal cache in an Amazon bucket. Before each build, we pull in this cache (restore_cache_bundle target in our Makefile), move it to an appropriate location and start the build. It gets used by the omnibus until the point of dirtying. After the build, we pack the new cache and tell CI to back it up to the Amazon bucket (pack_cache_bundle in our Makefile).

Both types of cache reduce the overall build time of GitLab and dependencies on external factors.

The cache mechanism can be summarized as follows:

  1. For each software dependency:
    1. Parse definition to understand version and SHA256.
    2. If the source file tarball available in the artifact cache in the Amazon bucket matches the version and SHA256, use it.
    3. Else, download the correct tarball from the upstream remote.
  2. Get the cache from the CI cache.
  3. For each software dependency:
    1. If a cache has been dirtied, break the loop.
    2. Else, check out the snapshot.
  4. If there are remaining dependencies:
    1. For each remaining dependency:
      1. Build the dependency.
      2. Create a snapshot and commit it.
  5. Push back the new build cache to the CI cache.

What happens during gitlab-ctl reconfigure

One of the commonly used commands while managing a GitLab instance is gitlab-ctl reconfigure. This command, in short, parses the config file and runs the recipes with the values supplied from it. The recipes to be run are defined in a file called dna.json present in the embedded folder inside the installation directory (This file is generated by a software dependency named gitlab-cookbooks that’s defined in the software definitions). In the case of GitLab CE, the cookbook named gitlab will be selected as the master recipe, which in turn invokes all other necessary recipes, including runit. In short, reconfigure is a chef-client run that configures different files and services with the values provided in the configuration template.

Multiple databases

Previously, the GitLab Rails application was the sole client connected to the Omnibus GitLab database. Over time, this has changed:

  • Praefect and Container Registry use their own databases.
  • The Rails application now uses a decomposed database.

Because additional databases might be necessary: