Elasticsearch integration

This page describes how to enable Advanced Search. When enabled, Advanced Search provides faster search response times and improved search features.

Version requirements

Elasticsearch version requirements

Support for Elasticsearch 6.8 was removed in GitLab 15.0.

Advanced Search works with the following versions of Elasticsearch.

GitLab version Elasticsearch version
GitLab 15.0 or later Elasticsearch 7.x - 8.x
GitLab 13.9 - 14.10 Elasticsearch 6.8 - 7.x
GitLab 13.3 - 13.8 Elasticsearch 6.4 - 7.x
GitLab 12.7 - 13.2 Elasticsearch 6.x - 7.x

Advanced Search follows Elasticsearch’s End of Life Policy. When we change Elasticsearch supported versions in GitLab, we announce them in deprecation notes in monthly release posts before we remove them.

OpenSearch version requirements

GitLab version Elasticsearch version
GitLab 15.0 or later OpenSearch 1.x or later

If you are using a compatible version and after connecting to OpenSearch, you get the message Elasticsearch version not compatible, unpause indexing.

System requirements

Elasticsearch requires additional resources to those documented in the GitLab system requirements.

Memory, CPU, and storage resource amounts vary depending on the amount of data you index into the Elasticsearch cluster. Heavily used Elasticsearch clusters may require more resources. According to Elasticsearch official guidelines, each node should have:

  • Memory: 8 GiB (minimum).
  • CPU: Modern processor with multiple cores. GitLab.com has minimal CPU requirements for Elasticsearch. Multiple cores provide extra concurrency, which is more beneficial than faster CPUs.
  • Storage: Use SSD storage. The total storage size of all Elasticsearch nodes is about 50% of the total size of your Git repositories. It includes one primary and one replica. The estimate_cluster_size Rake task (introduced in GitLab 13.10) uses total repository size to estimate the Advanced Search storage requirements.

Install Elasticsearch

Elasticsearch is not included in the Omnibus packages or when you install from source. You must install it separately and ensure you select your version. Detailed information on how to install Elasticsearch is out of the scope of this page.

You can install Elasticsearch yourself, or use a cloud hosted offering such as Elasticsearch Service (available on AWS, GCP, or Azure) or the Amazon OpenSearch service.

You should install Elasticsearch on a separate server. Running Elasticsearch on the same server as GitLab is not recommended and can cause a degradation in GitLab instance performance.

For a single node Elasticsearch cluster, the functional cluster health status is always yellow due to the allocation of the primary shard. Elasticsearch cannot assign replica shards to the same node as primary shards.

The search index updates after you:

Upgrade to a new Elasticsearch major version

Version history
  • Elasticsearch 6.8 support is removed with GitLab 15.0.
  • Upgrading from GitLab 14.10 to 15.0 requires that you are using any version of Elasticsearch 7.x.

You are not required to change the GitLab configuration when you upgrade Elasticsearch.

Elasticsearch repository indexer

To index Git repository data, GitLab uses an indexer written in Go.

Depending on your GitLab version, there are different installation procedures for the Go indexer:

Omnibus GitLab

Starting with GitLab 11.8, the Go indexer is included in Omnibus GitLab. The former Ruby-based indexer was removed in GitLab 12.3.

From source

First, we need to install some dependencies, then we build and install the indexer itself.

Install dependencies

This project relies on International Components for Unicode (ICU) for text encoding, therefore we must ensure the development packages for your platform are installed before running make.

Debian / Ubuntu

To install on Debian or Ubuntu, run:

sudo apt install libicu-dev
CentOS / RHEL

To install on CentOS or RHEL, run:

sudo yum install libicu-devel
macOS
note
You must first install Homebrew.

To install on macOS, run:

brew install icu4c
export PKG_CONFIG_PATH="/usr/local/opt/icu4c/lib/pkgconfig:$PKG_CONFIG_PATH"

Build and install

To build and install the indexer, run:

indexer_path=/home/git/gitlab-elasticsearch-indexer

# Run the installation task for gitlab-elasticsearch-indexer:
sudo -u git -H bundle exec rake gitlab:indexer:install[$indexer_path] RAILS_ENV=production
cd $indexer_path && sudo make install

The gitlab-elasticsearch-indexer is installed to /usr/local/bin.

You can change the installation path with the PREFIX environment variable. Please remember to pass the -E flag to sudo if you do so.

Example:

PREFIX=/usr sudo -E make install

After installation, be sure to enable Elasticsearch.

note
If you see an error such as Permission denied - /home/git/gitlab-elasticsearch-indexer/ while indexing, you may need to set the production -> elasticsearch -> indexer_path setting in your gitlab.yml file to /usr/local/bin/gitlab-elasticsearch-indexer, which is where the binary is installed.

For GitLab instances with more than 50GB repository data you can follow the instructions for how to index large instances efficiently below.

To enable Advanced Search, you must have administrator access to GitLab:

  1. On the top bar, select Menu > Admin.
  2. On the left sidebar, select Settings > Advanced Search.

    note
    To see the Advanced Search section, you need an active GitLab Premium license.
  3. Configure the Advanced Search settings for your Elasticsearch cluster. Do not enable Search with Elasticsearch enabled yet.
  4. Enable Elasticsearch indexing and select Save changes. This creates an empty index if one does not already exist.
  5. Select Index all projects.
  6. Select Check progress in the confirmation message to see the status of the background jobs.
  7. Personal snippets must be indexed using another Rake task:

    # Omnibus installations
    sudo gitlab-rake gitlab:elastic:index_snippets
    
    # Installations from source
    bundle exec rake gitlab:elastic:index_snippets RAILS_ENV=production
    
  8. After indexing completes, enable Search with Elasticsearch enabled and select Save changes.
note
When your Elasticsearch cluster is down while Elasticsearch is enabled, you might have problems updating documents such as issues because your instance queues a job to index the change, but cannot find a valid Elasticsearch cluster.

Advanced Search configuration

The following Elasticsearch settings are available:

Parameter Description
Elasticsearch indexing Enables or disables Elasticsearch indexing and creates an empty index if one does not already exist. You may want to enable indexing but disable search to give the index time to be fully completed, for example. Also, keep in mind that this option doesn’t have any impact on existing data, this only enables/disables the background indexer which tracks data changes and ensures new data is indexed.
Pause Elasticsearch indexing Enables or disables temporary indexing pause. This is useful for cluster migration/reindexing. All changes are still tracked, but they are not committed to the Elasticsearch index until resumed.
Search with Elasticsearch enabled Enables or disables using Elasticsearch in search.
URL The URL of your Elasticsearch instance. Use a comma-separated list to support clustering (for example, http://host1, https://host2:9200). If your Elasticsearch instance is password-protected, use the Username and Password fields described below. Alternatively, use inline credentials such as http://<username>:<password>@<elastic_host>:9200/.
Username The username of your Elasticsearch instance.
Password The password of your Elasticsearch instance.
Number of Elasticsearch shards Elasticsearch indices are split into multiple shards for performance reasons. In general, you should use at least 5 shards, and indices with tens of millions of documents need to have more shards (see below). Changes to this value do not take effect until the index is recreated. You can read more about tradeoffs in the Elasticsearch documentation.
Number of Elasticsearch replicas Each Elasticsearch shard can have a number of replicas. These are a complete copy of the shard, and can provide increased query performance or resilience against hardware failure. Increasing this value increases total disk space required by the index.
Limit the number of namespaces and projects that can be indexed Enabling this allows you to select namespaces and projects to index. All other namespaces and projects use database search instead. If you enable this option but do not select any namespaces or projects, none are indexed. Read more below.
Using AWS hosted Elasticsearch with IAM credentials Sign your Elasticsearch requests using AWS IAM authorization, AWS EC2 Instance Profile Credentials, or AWS ECS Tasks Credentials. Please refer to Identity and Access Management in Amazon OpenSearch Service for details of AWS hosted OpenSearch domain access policy configuration.
AWS Region The AWS region in which your OpenSearch Service is located.
AWS Access Key The AWS access key.
AWS Secret Access Key The AWS secret access key.
Maximum file size indexed See the explanation in instance limits..
Maximum field length See the explanation in instance limits..
Maximum bulk request size (MiB) The Maximum Bulk Request size is used by the GitLab Golang-based indexer processes and indicates how much data it ought to collect (and store in memory) in a given indexing process before submitting the payload to Elasticsearch’s Bulk API. This setting should be used with the Bulk request concurrency setting (see below) and needs to accommodate the resource constraints of both the Elast