- Use cases
- How it works
- Requirements for running Geo
- Setup instructions
- Post-installation documentation
- Remove Geo site
- Disable Geo
- Frequently Asked Questions
- Log files
Geo is the solution for widely distributed development teams and for providing a warm-standby as part of a disaster recovery strategy.
Fetching large repositories can take a long time for teams located far from a single GitLab instance.
Geo provides local, read-only sites of your GitLab instances. This can reduce the time it takes to clone and fetch large repositories, speeding up development.
For a video introduction to Geo, see Introduction to GitLab Geo - GitLab Features.
To make sure you’re using the right version of the documentation, go to the Geo page on GitLab.com and choose the appropriate release from the Switch branch/tag dropdown list. For example,
Geo uses a set of defined terms that are described in the Geo Glossary. Be sure to familiarize yourself with those terms.
Implementing Geo provides the following benefits:
- Reduce from minutes to seconds the time taken for your distributed developers to clone and fetch large repositories and projects.
- Enable all of your developers to contribute ideas and work in parallel, no matter where they are.
- Balance the read-only load between your primary and secondary sites.
In addition, it:
- Can be used for cloning and fetching projects, in addition to reading any data available in the GitLab web interface (see limitations).
- Overcomes slow connections between distant offices, saving time by improving speed for distributed teams.
- Helps reducing the loading time for automated tasks, custom integrations, and internal workflows.
- Can quickly fail over to a secondary site in a disaster recovery scenario.
- Allows planned failover to a secondary site.
- Read-only secondary sites: Maintain one primary GitLab site while still enabling read-only secondary sites for each of your distributed teams.
- Authentication system hooks: Secondary sites receive all authentication data (like user accounts and logins) from the primary instance.
- An intuitive UI: Secondary sites use the same web interface your team has grown accustomed to. In addition, there are visual notifications that block write operations and make it clear that a user is on a secondary sites.
Your Geo instance can be used for cloning and fetching projects, in addition to reading any data. This makes working with large repositories over large distances much faster.
When Geo is enabled, the:
- Original instance is known as the primary site.
- Replicated read-only sites are known as secondary sites.
Keep in mind that:
Secondary sites talk to the primary site to:
- Get user data for logins (API).
- Replicate repositories, LFS Objects, and Attachments (HTTPS + JWT).
- The primary site doesn’t talk to secondary sites to notify for changes (API).
- You can push directly to a secondary site (for both HTTP and SSH, including Git LFS).
- There are limitations when using Geo.
The following diagram illustrates the underlying architecture of Geo.
In this diagram:
- There is the primary site and the details of one secondary site.
- Writes to the database can only be performed on the primary site. A secondary site receives database updates via PostgreSQL streaming replication.
- If present, the LDAP server should be configured to replicate for Disaster Recovery scenarios.
- A secondary site performs different type of synchronizations against the primary site, using a special
authorization protected by JWT:
- Repositories are cloned/updated via Git over HTTPS.
- Attachments, LFS objects, and other files are downloaded via HTTPS using a private API endpoint.
From the perspective of a user performing Git operations:
- The primary site behaves as a full read-write GitLab instance.
- Secondary sites are read-only but proxy Git push operations to the primary site. This makes secondary sites appear to support push operations themselves.
To simplify the diagram, some necessary components are omitted.
A secondary site needs two different PostgreSQL databases:
- A read-only database instance that streams data from the main GitLab database.
- Another database instance used internally by the secondary site to record what data has been replicated.
In secondary sites, there is an additional daemon: Geo Log Cursor.
The following are required to run Geo:
- An operating system that supports OpenSSH 6.9 or later (needed for fast lookup of authorized SSH keys in the database) The following operating systems are known to ship with a current version of OpenSSH:
- PostgreSQL 12 or 13 with Streaming Replication
- Note,PostgreSQL 12 is deprecated and is removed in GitLab 16.0.
- Git 2.9 or later
- Git-lfs 2.4.2 or later on the user side when using LFS
- All sites must run the same GitLab version.
- All sites must run the same PostgreSQL versions.
- If using different operating system versions between Geo sites, check OS locale data compatibility across Geo sites to avoid silent corruption of database indexes.
- All sites must define the same repository storages.
Additionally, check the GitLab minimum requirements, and use the latest version of GitLab for a better experience.
The following table lists basic ports that must be open between the primary and secondary sites for Geo. To simplify failovers, you should open ports in both directions.
|Source site||Source port||Destination site||Destination port||Protocol|
See the full list of ports used by GitLab in Package defaults
HTTP requests from any Geo secondary site to the primary Geo site use the Internal URL of the primary Geo site. If this is not explicitly defined in the primary Geo site settings in the Admin Area, the public URL of the primary site is used.
To update the internal URL of the primary Geo site:
- On the left sidebar, select Search or go to.
- Select Admin Area.
- On the left sidebar, select Geo > Sites.
- Select Edit on the primary site.
- Change the Internal URL, then select Save changes.
The tracking database instance is used as metadata to control what needs to be updated on the disk of the local instance. For example:
- Download new assets.
- Fetch new LFS Objects.
- Fetch changes from a repository that has recently been updated.
Because the replicated database instance is read-only, we need this additional database instance for each secondary site.
- Reads a log of events replicated by the primary site to the secondary database instance.
- Updates the Geo Tracking Database instance with changes that must be executed.
When something is marked to be updated in the tracking database instance, asynchronous jobs running on the secondary site execute the required operations and update the state.
This new architecture allows GitLab to be resilient to connectivity issues between the sites. It doesn’t matter how long the secondary site is disconnected from the primary site as it is able to replay all the events in the correct order and become synchronized with the primary site again.
- Pushing directly to a secondary site redirects (for HTTP) or proxies (for SSH) the request to the primary site instead of handling it directly. The limitation is that you cannot use Git over HTTP with credentials embedded in the URI, for example,
https://user:email@example.com. For more information, see how to use a Geo Site.
- The primary site has to be online for OAuth login to happen. Existing sessions and Git are not affected. Support for the secondary site to use an OAuth provider independent from the primary is being planned.
- The installation takes multiple manual steps that together can take about an hour depending on circumstances. Consider using the GitLab Environment Toolkit Terraform and Ansible scripts to deploy and operate production GitLab instances based on our Reference Architectures, including automation of common daily tasks. Epic 1465 proposes to improve Geo installation even more.
- Real-time updates of issues/merge requests (for example, via long polling) doesn’t work on the secondary site.
- Using Geo secondary sites to accelerate runners is not officially supported. Support for this functionality is planned and can be tracked in epic 9779. If a replication lag occurs between the primary and secondary site, and the pipeline ref is not available on the secondary site when the job is executed, the job will fail.
- GitLab Runners cannot register with a secondary site. Support for this is planned for the future.
- Selective synchronization only limits what repositories and files are replicated. The entire PostgreSQL data is still replicated. Selective synchronization is not built to accommodate compliance / export control use cases.
- Pages access control doesn’t work on secondaries. See GitLab issue #9336 for details.
- Disaster recovery for multi-secondary sites causes downtime due to the complete re-synchronization and re-configuration of all non-promoted secondaries.
- For Git over SSH, to make the project clone URL display correctly regardless of which site you are browsing, secondary sites must use the same port as the primary. GitLab issue #339262 proposes to remove this limitation.
- Git push over SSH against a secondary site does not work for pushes over 1.86 GB. GitLab issue #413109 tracks this bug.
- Backups cannot be run on secondaries.
For setup instructions, see Setting up Geo.
After installing GitLab on the secondary sites and performing the initial configuration, see the following documentation for post-installation information.
For information on configuring Geo, see Geo configuration.
For information on how to update your Geo sites to the latest GitLab version, see Upgrading the Geo sites.
Introduced in GitLab 13.2.
Pausing and resuming replication is done through a command-line tool from the node in the secondary site where the
postgresql service is enabled.
postgresql is on a standalone database node, ensure that
gitlab.rb on that node contains the configuration line
gitlab_rails['geo_node_name'] = 'node_name', where
node_name is the same as the
geo_node_name on the application node.
To Pause: (from secondary)
To Resume: (from secondary)
For information on configuring Geo for multiple nodes, see Geo for multiple servers.
For information on configuring Geo with Object storage, see Geo with Object storage.
For information on using Geo in disaster recovery situations to mitigate data-loss and restore services, see Disaster Recovery.
For more information on how to replicate the Container Registry, see Container Registry for a secondary site.
For more information on using Geo proxying on secondary sites, see Geo proxying for secondary sites.
For more information on configuring Single Sign-On (SSO), see Geo with Single Sign-On (SSO).
For more information on configuring LDAP, see Geo with Single Sign-On (SSO) > LDAP.
For more information on Geo security, see Geo security review.
For more information on tuning Geo, see Tuning Geo.
For an example of how to set up a location-aware Git remote URL with AWS Route53, see Location-aware Git remote URL with AWS Route53.
When a secondary site is set up, it starts replicating missing data from the primary site in a process known as backfill. You can monitor the synchronization process on each Geo site from the primary site’s Geo Nodes dashboard in your browser.
Failures that happen during a backfill are scheduled to be retried at the end of the backfill.
For more information on removing a Geo site, see Removing secondary Geo sites.
To find out how to disable Geo, see Disabling Geo.
For answers to common questions, see the Geo FAQ.
Geo stores structured log messages in a
geo.log file. For Linux package
installations, this file is at
This file contains information about when Geo attempts to sync repositories and files. Each line in the file contains a separate JSON entry that can be ingested into. For example, Elasticsearch or Splunk.
This message shows that Geo detected that a repository update was needed for project
For troubleshooting steps, see Geo Troubleshooting.