PostgreSQL replication and failover with Omnibus GitLab

If you’re a Free user of GitLab self-managed, consider using a cloud-hosted solution. This document doesn’t cover installations from source.

If a setup with replication and failover isn’t what you were looking for, see the database configuration document for the Omnibus GitLab packages.

It’s recommended to read this document fully before attempting to configure PostgreSQL with replication and failover for GitLab.

Architecture

The Omnibus GitLab recommended configuration for a PostgreSQL cluster with replication failover requires:

  • A minimum of three PostgreSQL nodes.
  • A minimum of three Consul server nodes.
  • A minimum of three PgBouncer nodes that track and handle primary database reads and writes.
    • An internal load balancer (TCP) to balance requests between the PgBouncer nodes.
  • Database Load Balancing enabled.
    • A local PgBouncer service configured on each PostgreSQL node. Note that this is separate from the main PgBouncer cluster that tracks the primary.

You also need to take into consideration the underlying network topology, making sure you have redundant connectivity between all Database and GitLab instances to avoid the network becoming a single point of failure.

note
As of GitLab 13.3, PostgreSQL 12 is shipped with Omnibus GitLab. Clustering for PostgreSQL 12 is supported only with Patroni. See the Patroni section for further details. Starting with GitLab 14.0, only PostgreSQL 12 is shipped with Omnibus GitLab, and thus Patroni becomes mandatory for replication and failover.

Database node

Each database node runs four services:

  • PostgreSQL: The database itself.
  • Patroni: Communicates with other Patroni services in the cluster and handles failover when issues with the leader server occurs. The failover procedure consists of:
    • Selecting a new leader for the cluster.
    • Promoting the new node to leader.
    • Instructing remaining servers to follow the new leader node.
  • PgBouncer: A local pooler for the node. Used for read queries as part of Database Load Balancing.
  • Consul agent: To communicate with Consul cluster which stores the current Patroni state. The agent monitors the status of each node in the database cluster and tracks its health in a service definition on the Consul cluster.

Consul server node

The Consul server node runs the Consul server service. These nodes must have reached the quorum and elected a leader before Patroni cluster bootstrap; otherwise, database nodes wait until such Consul leader is elected.

PgBouncer node

Each PgBouncer node runs two services:

  • PgBouncer: The database connection pooler itself.
  • Consul agent: Watches the status of the PostgreSQL service definition on the Consul cluster. If that status changes, Consul runs a script which updates the PgBouncer configuration to point to the new PostgreSQL leader node and reloads the PgBouncer service.

Connection flow

Each service in the package comes with a set of default ports. You may need to make specific firewall rules for the connections listed below:

There are several connection flows in this setup:

Primary

  • Application servers connect to either PgBouncer directly via its default port or via a configured Internal Load Balancer (TCP) that serves multiple PgBouncers.
  • PgBouncer connects to the primary database server’s PostgreSQL default port.

Database Load Balancing

For read queries against data that haven’t been recently changed and are up to date on all database nodes:

  • Application servers connect to the local PgBouncer service via its default port on each database node in a round-robin approach.
  • Local PgBouncer connects to the local database server’s PostgreSQL default port.

Replication

  • Patroni actively manages the running PostgreSQL processes and configuration.
  • PostgreSQL secondaries connect to the primary database servers PostgreSQL default port
  • Consul servers and agents connect to each others Consul default ports

Setting it up

Required information

Before proceeding with configuration, you need to collect all the necessary information.

Network information

PostgreSQL doesn’t listen on any network interface by default. It needs to know which IP address to listen on to be accessible to other services. Similarly, PostgreSQL access is controlled based on the network source.

This is why you need:

  • The IP address of each node’s network interface. This can be set to 0.0.0.0 to listen on all interfaces. It cannot be set to the loopback address 127.0.0.1.
  • Network Address. This can be in subnet (that is, 192.168.0.0/255.255.255.0) or Classless Inter-Domain Routing (CIDR) (192.168.0.0/24) form.

Consul information

When using default setup, minimum configuration requires:

  • CONSUL_USERNAME. The default user for Omnibus GitLab is gitlab-consul
  • CONSUL_DATABASE_PASSWORD. Password for the database user.
  • CONSUL_PASSWORD_HASH. This is a hash generated out of Consul username/password pair. It can be generated with:

     sudo gitlab-ctl pg-password-md5 CONSUL_USERNAME
    
  • CONSUL_SERVER_NODES. The IP addresses or DNS records of the Consul server nodes.

Few notes on the service itself:

  • The service runs under a system account, by default gitlab-consul.
  • If you are using a different username, you have to specify it through the CONSUL_USERNAME variable.
  • Passwords are stored in the following locations:
    • /etc/gitlab/gitlab.rb: hashed
    • /var/opt/gitlab/pgbouncer/pg_auth: hashed
    • /var/opt/gitlab/consul/.pgpass: plaintext

PostgreSQL information

When configuring PostgreSQL, we do the following:

  • Set max_replication_slots to double the number of database nodes. Patroni uses one extra slot per node when initiating the replication.
  • Set max_wal_senders to one more than the allocated number of replication slots in the cluster. This prevents replication from using up all of the available database connections.

In this document we are assuming 3 database nodes, which makes this configuration:

patroni['postgresql']['max_replication_slots'] = 6
patroni['postgresql']['max_wal_senders'] = 7

As previously mentioned, prepare the network subnets that need permission to authenticate with the database. You also need to have the IP addresses or DNS records of Consul server nodes on hand.

You need the following password information for the application’s database user:

  • POSTGRESQL_USERNAME. The default user for Omnibus GitLab is gitlab
  • POSTGRESQL_USER_PASSWORD. The password for the database user
  • POSTGRESQL_PASSWORD_HASH. This is a hash generated out of the username/password pair. It can be generated with:

    sudo gitlab-ctl pg-password-md5 POSTGRESQL_USERNAME
    

Patroni information

You need the following password information for the Patroni API:

  • PATRONI_API_USERNAME. A username for basic auth to the API
  • PATRONI_API_PASSWORD. A password for basic auth to the API

PgBouncer information

When using a default setup, the minimum configuration requires:

  • PGBOUNCER_USERNAME. The default user for Omnibus GitLab is pgbouncer
  • PGBOUNCER_PASSWORD. This is a password for PgBouncer service.
  • PGBOUNCER_PASSWORD_HASH. This is a hash generated out of PgBouncer username/password pair. It can be generated with:

    sudo gitlab-ctl pg-password-md5 PGBOUNCER_USERNAME
    
  • PGBOUNCER_NODE, is the IP address or a FQDN of the node running PgBouncer.

Few things to remember about the service itself:

  • The service runs as the same system account as the database. In the package, this is by default gitlab-psql
  • If you use a non-default user account for PgBouncer service (by default pgbouncer), you need to specify this username.
  • Passwords are stored in the following locations:
    • /etc/gitlab/gitlab.rb: hashed, and in plain text
    • /var/opt/gitlab/pgbouncer/pg_auth: hashed

Installing Omnibus GitLab

First, make sure to download/install Omnibus GitLab on each node.

Make sure you install the necessary dependencies from step 1, add GitLab package repository from step 2. When installing the GitLab package, do not supply EXTERNAL_URL value.

Configuring the Database nodes

  1. Make sure to configure the Consul nodes.
  2. Make sure you collect CONSUL_SERVER_NODES, PGBOUNCER_PASSWORD_HASH, POSTGRESQL_PASSWORD_HASH, the number of db nodes, and the network address before executing the next step.

Configuring Patroni cluster

You must enable Patroni explicitly to be able to use it (with patroni['enable'] = true).

Any PostgreSQL configuration item that controls replication, for example wal_level, max_wal_senders, or others are strictly controlled by Patroni. These configurations override the original settings that you make with the postgresql[...] configuration key. Hence, they are all separated and placed under patroni['postgresql'][...]. This behavior is limited to replication. Patroni honours any other PostgreSQL configuration that was made with the postgresql[...] configuration key. For example, max_wal_senders by default is set to 5. If you wish to change this you must set it with the patroni['postgresql']['max_wal_senders'] configuration key.

note
The configuration of a Patroni node is very similar to a repmgr but shorter. When Patroni is enabled, first you can ignore any replication setting of PostgreSQL (which is overwritten). Then, you can remove any repmgr[...] or repmgr-specific configuration as well. Es