GitLab Documentation

GitLab Geo database replication

Note: This is the documentation for the Omnibus GitLab packages. For installations from source, follow the database replication for installations from source guide.

Note: Stages of the setup process must be completed in the documented order. Before attempting the steps in this stage, complete all prior stages.

This document describes the minimal steps you have to take in order to replicate your GitLab database into another server. You may have to change some values according to your database setup, how big it is, etc.

You are encouraged to first read through all the steps before executing them in your testing/production environment.

PostgreSQL replication

The GitLab primary node where the write operations happen will connect to primary database server, and the secondary ones which are read-only will connect to secondary database servers (which are read-only too).

Note: In many databases documentation you will see "primary" being referenced as "master" and "secondary" as either "slave" or "standby" server (read-only).

Since GitLab 9.4: We recommend using PostgreSQL replication slots to ensure the primary retains all the data necessary for the secondaries to recover. See below for more details.

Prerequisites

The following guide assumes that:

If your GitLab installation is using external PostgreSQL, the Omnibus roles will not be able to perform all necessary configuration steps. Refer to External PostreSQL for additional instructions.

Step 1. Configure the primary server

  1. SSH into your GitLab primary server and login as root:

    sudo -i
    
  2. Execute the command below to define the node as primary Geo node:

    gitlab-ctl set-geo-primary-node
    

    This command will use your defined external_url in /etc/gitlab/gitlab.rb.

  3. Omnibus GitLab already has a replication user called gitlab_replicator. You must set its password manually. You will be prompted to enter a password:

    gitlab-ctl set-replication-password
    

    This command will also read postgresql['sql_replication_user'] Omnibus setting in case you have changed gitlab_replicator username to something else.

  4. Set up TLS support for the PostgreSQL primary server

    Warning: Only skip this step if you know that PostgreSQL traffic between the primary and secondary will be secured through some other means, e.g., a known-safe physical network path or a site-to-site VPN that you have configured.

    If you are replicating your database across the open Internet, it is essential that the connection is TLS-secured. Correctly configured, this provides protection against both passive eavesdroppers and active "man-in-the-middle" attackers.

    To do this, PostgreSQL needs to be provided with a key and certificate to use. There are two options to do this:

    Option A: Re-use the same files you're using for your main GitLab instance.

    Option B: Generate a self-signed certificate just for PostgreSQL's use.

    Prefer option A if you already have a long-lived certificate. Prefer option B if your certificates expire regularly (e.g., Let's Encrypt), or if PostgreSQL is running on a different server to the main GitLab services (this may be the case in a HA configuration, for instance).

    For Option A:

    Copy the SSL keys from your existing GitLab installation. If you're re-using certificates already in GitLab, they are likely to be in the /etc/gitlab/ssl directory. Copy them into the PostgreSQL directory via this example:

    # Certificate and key currently used by GitLab
    # - replace primary.geo.example.com with your domain
    install -o gitlab-psql -g gitlab-psql -m 0400 -T /etc/gitlab/ssl/primary.geo.example.com.crt ~gitlab-psql/data/server.crt
    install -o gitlab-psql -g gitlab-psql -m 0400 -T /etc/gitlab/ssl/primary.geo.example.com.key ~gitlab-psql/data/server.key
    

    For Option B:

    To generate a self-signed certificate and key, run this command:

    openssl req -nodes -batch -x509 -newkey rsa:4096 -keyout server.key -out server.crt -days 3650
    

    This will create two files - server.key and server.crt - that you can use for authentication.

    PostgreSQL's permission requirements are very strict, so whether you're re-using your certificates or just generated new ones, copy the files to the correct location:

    # Self-signed certificate and key
    # - assumes the files are in your current working directory
    install -o gitlab-psql -g gitlab-psql -m 0400 -T server.crt ~gitlab-psql/data/server.crt
    install -o gitlab-psql -g gitlab-psql -m 0400 -T server.key ~gitlab-psql/data/server.key
    
  5. Add this configuration to /etc/gitlab/gitlab.rb. Additional options are documented here.

    postgresql['ssl'] = 'on'
    
  6. Configure PostgreSQL to listen on an external network interface

    Edit /etc/gitlab/gitlab.rb and add the following. Note that GitLab 9.1 added the geo_primary_role configuration variable:

    geo_primary_role['enable'] = true
    postgresql['listen_address'] = '1.2.3.4'
    postgresql['trust_auth_cidr_addresses'] = ['127.0.0.1/32','1.2.3.4/32']
    postgresql['md5_auth_cidr_addresses'] = ['5.6.7.8/32']
    # New for 9.4: Set this to be the number of Geo secondary nodes you have
    postgresql['max_replication_slots'] = 1
    # postgresql['max_wal_senders'] = 10
    # postgresql['wal_keep_segments'] = 10
    

    For external PostgreSQL instances, see additional instructions.

    Where 1.2.3.4 is the IP address of the primary server, and 5.6.7.8 is the IP address of the secondary one.

    For security reasons, PostgreSQL by default only listens on the local interface (e.g. 127.0.0.1). However, GitLab Geo needs to communicate between the primary and secondary nodes over a common network, such as a corporate LAN or the public Internet. For this reason, we need to configure PostgreSQL to listen on more interfaces.

    The listen_address option opens PostgreSQL up to external connections with the interface corresponding to the given IP. See the PostgreSQL documentation for more details.

    Note that if you are running GitLab Geo with a cloud provider (e.g. Amazon Web Services), the internal interface IP (as provided by ifconfig) may be different from the public IP address. For example, suppose you have a nodes with the following configuration:

    Node Type Internal IP External IP
    Primary 10.1.5.3 54.193.124.100
    Secondary 10.1.10.5 54.193.100.155

    If you are running two nodes in different cloud availability zones, you may need to double check that the nodes can communicate over the internal IP addresses. For example, servers on Amazon Web Services in the same Virtual Private Cloud (VPC) can do this. Google Compute Engine also offers an internal network that supports cross-availability zone networking.

    For the above example, the following configuration uses the internal IPs to replicate the database from the primary to the secondary:

    # Example configuration using internal IPs for a cloud configuration
    geo_primary_role['enable'] = true
    postgresql['listen_address'] = '10.1.5.3'
    postgresql['trust_auth_cidr_addresses'] = ['127.0.0.1/32','10.1.5.3/32']
    postgresql['md5_auth_cidr_addresses'] = ['10.1.10.5/32']
    postgresql['max_replication_slots'] = 1 # Number of Geo secondary nodes
    # postgresql['max_wal_senders'] = 10
    # postgresql['wal_keep_segments'] = 10
    

    If you prefer that your nodes communicate over the public Internet, you may choose the IP addresses from the "External IP" column above.

  7. Optional: If you want to add another secondary, the relevant setting would look like:

    postgresql['md5_auth_cidr_addresses'] = ['5.6.7.8/32','11.22.33.44/32']
    

    You may also want to edit the wal_keep_segments and max_wal_senders to match your database replication requirements. Consult the PostgreSQL - Replication documentation for more information.

  8. Save the file and reconfigure GitLab for the database listen changes to take effect.

    This step will fail. This is caused by Omnibus#2797.

    Restart PostgreSQL:

    gitlab-ctl restart postgresql
    

    Reconfigure GitLab again. It should complete cleanly.

  9. New for 9.4: Restart your primary PostgreSQL server to ensure the replication slot changes take effect (sudo gitlab-ctl restart postgresql for Omnibus-provided PostgreSQL).

  10. Now that the PostgreSQL server is set up to accept remote connections, run netstat -plnt to make sure that PostgreSQL is listening on port 5432 to the server's public IP.

  11. Verify that clock synchronization is enabled.

    Important: For Geo to work correctly, all nodes must have their clocks synchronized. It is not required for all nodes to be set to the same time zone, but when the respective times are converted to UTC time, the clocks must be synchronized to within 60 seconds of each other.

    If you are using Ubuntu, verify NTP sync is enabled:

    timedatectl status | grep 'NTP synchronized'
    

    Refer to your Linux distribution documentation to setup clock synchronization. This can easily be done using any NTP-compatible daemon.

Step 2. Add the secondary GitLab node

To prevent the secondary geo node trying to act as the primary once the database is replicated, the secondary geo node must be configured on the primary before the database is replicated.

  1. Visit the primary node's Admin Area ➔ Geo Nodes (/admin/geo_nodes) in your browser.
  2. Add the secondary node by providing its full URL. Do NOT check the box 'This is a primary node'.
  3. Added in GitLab 9.5: Choose which namespaces should be replicated by the secondary node. Leave blank to replicate all. Read more in selective replication.
  4. Click the Add node button.

Step 3. Configure the secondary server

  1. SSH into your GitLab secondary server and login as root:

    sudo -i
    
  2. Set up PostgreSQL TLS verification on the secondary

    If you configured PostgreSQL to accept TLS connections in Step 1, then you need to provide a list of "known-good" certificates to the secondary. It uses this list to keep the connection secure against an active "man-in-the-middle" attack.

    If you reused your existing certificates on the primary, you can use the list of valid root certificates provided with omnibus.

    Or, if you generated a self-signed certificate, copy the generated server.crt file onto the secondary server from the primary, then install it in the right location.

    # Certificate and key currently used by GitLab
    mkdir -p ~gitlab-psql/.postgresql
    ln -s /opt/gitlab/embedded/ssl/certs/cacert.pem ~gitlab-psql/.postgresql/root.crt
    
    # Self-signed certificate and key
    install -o gitlab-psql -g gitlab-psql -m 0400 -T server.crt ~gitlab-psql/.postgresql/root.crt
    

    PostgreSQL will now only recognize that exact certificate when verifying TLS connections.

  3. Test that the remote connection to the primary server works.

    # Certificate and key currently used by GitLab, and connecting by FQDN
    sudo -u gitlab-psql /opt/gitlab/embedded/bin/psql -h primary.geo.example.com -U gitlab_replicator -d "dbname=gitlabhq_production sslmode=verify-full" -W
    
    # Self-signed certificate and key, or connecting by IP address
    sudo -u gitlab-psql /opt/gitlab/embedded/bin/psql -h 1.2.3.4 -U gitlab_replicator -d "dbname=gitlabhq_production sslmode=verify-ca" -W
    

    When prompted enter the password you set in the first step for the gitlab_replicator user. If all worked correctly, you should see the database prompt.

    A failure to connect here indicates that the TLS or networking configuration is incorrect. Ensure that you've used the correct certificates and IP addresses / FQDNs throughout. If you have a firewall, ensure that the secondary is permitted to access the primary on port 5432.

  4. Exit the PostgreSQL console:

    \q
    
  5. Edit /etc/gitlab/gitlab.rb and add the following:

    geo_secondary_role['enable'] = true
    

    For external PostgreSQL instances, see additional instructions.

  6. Reconfigure GitLab for the changes to take effect.

  7. Verify that clock synchronization is enabled.

    Important: For Geo to work correctly, all nodes must have their clocks synchronized. It is not required for all nodes to be set to the same time zone, but when the respective times are converted to UTC time, the clocks must be synchronized to within 60 seconds of each other.

    If you are using Ubuntu, verify NTP sync is enabled:

    timedatectl status | grep 'NTP synchronized'
    

    Refer to your Linux distribution documentation to setup clock synchronization. This can easily be done using any NTP-compatible daemon.

Step 4. Initiate the replication process

Below we provide a script that connects to the primary server, replicates the database and creates the needed files for replication.

The directories used are the defaults that are set up in Omnibus. If you have changed any defaults or are using a source installation, configure it as you see fit replacing the directories and paths.

Warning: Make sure to run this on the secondary server as it removes all PostgreSQL's data before running pg_basebackup.

  1. SSH into your GitLab secondary server and login as root:

    sudo -i
    
  2. New for 9.4: Choose a database-friendly name to use for your secondary to use as the replication slot name. For example, if your domain is secondary.geo.example.com, you may use secondary_example as the slot name.

  3. Execute the command below to start a backup/restore and begin the replication:

    # Certificate and key currently used by GitLab, and connecting by FQDN
    gitlab-ctl replicate-geo-database --host=primary.geo.example.com --slot-name=secondary_example
    
    # Self-signed certificate and key, or connecting by IP
    gitlab-ctl replicate-geo-database --host=1.2.3.4 --slot-name=secondary_example --sslmode=verify-ca
    

    If PostgreSQL is listening on a non-standard port, add --port= as well.

    If you have to connect to a specific IP address, rather than the FQDN of the primary, to reach your PostgreSQL server, then you should pass --sslmode=verify-ca as well. This should only be the case if you have also used a self-signed certificate. verify-ca is not safe if you are connecting to an IP address and re-using an existing TLS certificate!

    Pass --sslmode=prefer if you are happy to skip PostgreSQL TLS authentication altogether (e.g., you know the network path is secure, or you are using a site-to-site VPN).

    You can read more details about each sslmode in the PostgreSQL documentation; the instructions above are carefully written to ensure protection against both passive eavesdroppers and active "man-in-the-middle" attackers.

    When prompted, enter the password you set up for the gitlab_replicator user in the first step.

    New for 9.4: Change the --slot-name to the name of the replication slot to be used on the primary database. The script will attempt to create the replication slot automatically if it does not exist.

    This command also takes a number of additional options. You can use --help to list them all, but here are a couple of tips:

    If you're setting up replication on a brand-new secondary that has no data, you may want to pass --no-wait --skip-backup to speed up the process - but be certain that you're running it against the right GitLab installation first! It will cause data loss otherwise.

    If you're repurposing an old server into a Geo secondary, you'll need to add --force to the command line.

The replication process is now over.

External PostgreSQL instances

For installations using external PostgreSQL instances, the geo_primary_role and geo_secondary_role includes configuration changes that must be applied manually.

The geo_primary_role makes configuration changes to pg_hba.conf and postgresql.conf on the primary:

# pg_hba.conf
# GitLab Geo Primary
host    replication gitlab_replicator <trusted secondary IP>/32     md5
# postgresql.conf
# Geo Primary Role
sql_replication_user = gitlab_replicator
wal_level = hot_standby
max_wal_senders = 10
wal_keep_segments = 50
max_replication_slots = 1 # number of secondary instances
hot_standby = on

Th geo_secondary_role makes configuration changes to postgresql.conf and enables the Geo Log Cursor (geo_logcursor) and secondary tracking database on the secondary. The PostgreSQL settings for this database it adds to the default settings:

# postgresql.conf
# Geo Secondary Role
wal_level = hot_standby
max_wal_senders = 10
wal_keep_segments = 10
hot_standby = on

Geo secondary nodes use a tracking database to keep track of replication status and recover automatically from some replication issues. Follow the instructions for enabling tracking database on the secondary server.

MySQL replication

We don't support MySQL replication for GitLab Geo.

Troubleshooting

Read the troubleshooting document.


Leave a comment below if you have any feedback on the documentation. For support and other inquires, see getting help.