Zoekt

  • Tier: Premium, Ultimate
  • Offering: GitLab.com, GitLab Self-Managed
  • Status: Beta

This feature is in beta and subject to change without notice. For more information, see epic 9404. To provide feedback on this feature, leave a comment on issue 420920.

Zoekt is an open-source search engine designed specifically to search for code.

With this integration, you can use exact code search instead of advanced search to search for code in GitLab. You can use exact match and regular expression modes to search for code in a group or repository.

Install Zoekt

Prerequisites:

  • You must have administrator access to the instance.

To enable exact code search in GitLab, you must have at least one Zoekt node connected to the instance. The following installation methods are supported for Zoekt:

The following installation methods are available for testing, not for production use:

Prerequisites:

  • You must have administrator access to the instance.
  • You must install Zoekt.

To enable exact code search in GitLab:

  1. On the left sidebar, at the bottom, select Admin.
  2. Select Settings > Search.
  3. Expand Exact code search configuration.
  4. Select the Enable indexing and Enable searching checkboxes.
  5. Select Save changes.

Check indexing status

Prerequisites:

  • You must have administrator access to the instance.

Indexing performance depends on the CPU and memory limits on the Zoekt indexer nodes. To check indexing status:

Run this Rake task:

gitlab-rake gitlab:zoekt:info

To have the data refresh automatically every 10 seconds, run this task instead:

gitlab-rake "gitlab:zoekt:info[10]"

In a Rails console, run these commands:

Search::Zoekt::Index.group(:state).count
Search::Zoekt::Repository.group(:state).count
Search::Zoekt::Task.group(:state).count

Run a health check

Prerequisites:

  • You must have administrator access to the instance.

Run a health check to understand the status of your Zoekt infrastructure, including:

  • Online and offline nodes
  • Indexing and search settings
  • Search API endpoints
  • JSON web token generation

To run a health check, execute the following task:

gitlab-rake gitlab:zoekt:health

This task provides:

  • The overall status: HEALTHY, DEGRADED, or UNHEALTHY
  • Recommendations for resolving detected issues
  • Exit codes for automation and monitoring integrations: 0=healthy, 1=degraded, or 2=unhealthy

Run checks automatically

To run health checks automatically every 10 seconds, execute the following task:

gitlab-rake "gitlab:zoekt:health[10]"

The output includes colored status indicators and shows:

  • Online and offline node counts, storage usage warnings, and connectivity issues
  • Core settings validation and namespace and repository indexing statuses
  • The overall status including a combined health assessment: HEALTHY, DEGRADED, or UNHEALTHY
  • Recommendations for resolving issues

Pause indexing

Prerequisites:

  • You must have administrator access to the instance.

To pause indexing for exact code search:

  1. On the left sidebar, at the bottom, select Admin.
  2. Select Settings > Search.
  3. Expand Exact code search configuration.
  4. Select the Pause indexing checkbox.
  5. Select Save changes.

When you pause indexing for exact code search, all changes in your repository are queued. To resume indexing, clear the Pause indexing for exact code search checkbox.

Index root namespaces automatically

Prerequisites:

  • You must have administrator access to the instance.

You can index both existing and new root namespaces automatically. To index all root namespaces automatically:

  1. On the left sidebar, at the bottom, select Admin.
  2. Select Settings > Search.
  3. Expand Exact code search configuration.
  4. Select the Index root namespaces automatically checkbox.
  5. Select Save changes.

When you enable this setting, GitLab creates indexing tasks for all projects in:

  • All groups and subgroups
  • Any new root namespace

After a project is indexed, GitLab creates only incremental indexing when a repository change is detected.

When you disable this setting:

  • Existing root namespaces remain indexed.
  • New root namespaces are no longer indexed.

Cache search results

Prerequisites:

  • You must have administrator access to the instance.

You can cache search results for better performance. This feature is enabled by default and caches results for five minutes.

To cache search results:

  1. On the left sidebar, at the bottom, select Admin.
  2. Select Settings > Search.
  3. Expand Exact code search configuration.
  4. Select the Cache search results for five minutes checkbox.
  5. Select Save changes.

Set concurrent indexing tasks

Prerequisites:

  • You must have administrator access to the instance.

You can set the number of concurrent indexing tasks for a Zoekt node relative to its CPU capacity.

A higher multiplier means more tasks can run concurrently, which would improve indexing throughput at the cost of increased CPU usage. The default value is 1.0 (one task per CPU core).

You can adjust this value based on the node’s performance and workload. To set the number of concurrent indexing tasks:

  1. On the left sidebar, at the bottom, select Admin.

  2. Select Settings > Search.

  3. Expand Exact code search configuration.

  4. In the Indexing CPU to tasks multiplier text box, enter a value.

    For example, if a Zoekt node has 4 CPU cores and the multiplier is 1.5, the number of concurrent tasks for the node is 6.

  5. Select Save changes.

Set the number of parallel processes per indexing task

Prerequisites:

  • You must have administrator access to the instance.

You can set the number of parallel processes per indexing task.

A higher number would improve indexing time at the cost of increased CPU and memory usage. The default value is 1 (one process per indexing task).

You can adjust this value based on the node’s performance and workload. To set the number of parallel processes per indexing task:

  1. On the left sidebar, at the bottom, select Admin.
  2. Select Settings > Search.
  3. Expand Exact code search configuration.
  4. In the Number of parallel processes per indexing task text box, enter a value.
  5. Select Save changes.

Set the number of namespaces per indexing rollout

Prerequisites:

  • You must have administrator access to the instance.

You can set the number of namespaces per RolloutWorker job for initial indexing. The default value is 32. You can adjust this value based on the node’s performance and workload.

To set the number of namespaces per indexing rollout:

  1. On the left sidebar, at the bottom, select Admin.
  2. Select Settings > Search.
  3. Expand Exact code search configuration.
  4. In the Number of namespaces per indexing rollout text box, enter a number greater than zero.
  5. Select Save changes.

Define when offline nodes are automatically deleted

Prerequisites:

  • You must have administrator access to the instance.

You can delete offline Zoekt nodes automatically after a specific period of time along with their related indices, repositories, and tasks. The default value is 12h (12 hours).

Use this setting to manage your Zoekt infrastructure and prevent orphaned resources. To define when offline nodes are automatically deleted:

  1. On the left sidebar, at the bottom, select Admin.
  2. Select Settings > Search.
  3. Expand Exact code search configuration.
  4. In the Offline nodes automatically deleted after text box, enter a value (for example, 30m (30 minutes), 2h (two hours), or 1d (one day)). To disable automatic deletion, set to 0.
  5. Select Save changes.

Define the indexing timeout for a project

Prerequisites:

  • You must have administrator access to the instance.

You can define the indexing timeout for a project. The default value is 30m (30 minutes).

To define the indexing timeout for a project:

  1. On the left sidebar, at the bottom, select Admin.
  2. Select Settings > Search.
  3. Expand Exact code search configuration.
  4. In the Indexing timeout per project text box, enter a value (for example, 30m (30 minutes), 2h (two hours), or 1d (one day)).
  5. Select Save changes.

Set the maximum number of files in a project to be indexed

Prerequisites:

  • You must have administrator access to the instance.

You can set the maximum number of files in a project that can be indexed. Projects with more files than this limit in the default branch are not indexed.

The default value is 500,000.

You can adjust this value based on the node’s performance and workload. To set the maximum number of files in a project to be indexed:

  1. On the left sidebar, at the bottom, select Admin.
  2. Select Settings > Search.
  3. Expand Exact code search configuration.
  4. In the Maximum number of files per project to be indexed text box, enter a number greater than zero.
  5. Select Save changes.

Define the retry interval for failed namespaces

Prerequisites:

  • You must have administrator access to the instance.

You can define the retry interval for namespaces that previously failed. The default value is 1d (one day). A value of 0 means failed namespaces never retry.

To define the retry interval for failed namespaces:

  1. On the left sidebar, at the bottom, select Admin.
  2. Select Settings > Search.
  3. Expand Exact code search configuration.
  4. In the Retry interval for failed namespaces text box, enter a value (for example, 30m (30 minutes), 2h (two hours), or 1d (one day)).
  5. Select Save changes.

Run Zoekt on a separate server

Prerequisites:

  • You must have administrator access to the instance.

To run Zoekt on a different server than GitLab:

  1. Change the Gitaly listening interface.
  2. Install Zoekt.

Sizing recommendations

The following recommendations might be over-provisioned for some deployments. You should monitor your deployment to ensure:

  • No out-of-memory events occur.
  • CPU throttling is not excessive.
  • Indexing performance meets your requirements.

Adjust resources based on your specific workload characteristics, including:

  • Repository size and complexity
  • Number of active developers
  • Frequency of code changes
  • Indexing patterns

Nodes

For optimal performance, proper sizing of Zoekt nodes is crucial. Sizing recommendations differ between Kubernetes and VM deployments due to how resources are allocated and managed.

Kubernetes deployments

The following table shows recommended resources for Kubernetes deployments based on index storage requirements:

DiskWebserver CPUWebserver memoryIndexer CPUIndexer memory
128 GB116 GiB16 GiB
256 GB1.532 GiB18 GiB
512 GB264 GiB112 GiB
1 TB3128 GiB1.524 GiB
2 TB4256 GiB232 GiB

To manage resources more granularly, you can allocate CPU and memory separately to different containers.

For Kubernetes deployments:

  • Do not set CPU limits for Zoekt containers. CPU limits might cause unnecessary throttling during indexing bursts, which would significantly impact performance. Instead, rely on resource requests to guarantee minimum CPU availability and ensure containers use additional CPU when available and needed.
  • Set appropriate memory limits to prevent resource contention and out-of-memory conditions.
  • Use high-performance storage classes for better indexing performance. GitLab.com uses pd-balanced on GCP, which balances performance and cost. Equivalent options include gp3 on AWS and Premium_LRS on Azure.

VM and bare metal deployments

The following table shows recommended resources for VM and bare metal deployments based on index storage requirements:

DiskVM sizeTotal CPUTotal memoryAWSGCPAzure
128 GBSmall2 cores16 GBr5.largen1-highmem-2Standard_E2s_v3
256 GBMedium4 cores32 GBr5.xlargen1-highmem-4Standard_E4s_v3
512 GBLarge4 cores64 GBr5.2xlargen1-highmem-8Standard_E8s_v3
1 TBX-Large8 cores128 GBr5.4xlargen1-highmem-16Standard_E16s_v3
2 TB2X-Large16 cores256 GBr5.8xlargen1-highmem-32Standard_E32s_v3

You can allocate these resources only to the entire node.

For VM and bare metal deployments:

  • Monitor CPU, memory, and disk usage to identify bottlenecks. Both webserver and indexer processes share the same CPU and memory resources.
  • Consider using SSD storage for better indexing performance.
  • Ensure adequate network bandwidth for data transfer between GitLab and Zoekt nodes.

Storage

Storage requirements for Zoekt vary significantly based on repository characteristics, including the number of large and binary files.

As a starting point, you can estimate your Zoekt storage to be half your Gitaly storage. For example, if your Gitaly storage is 1 TB, you might need approximately 500 GB of Zoekt storage.

To monitor the use of Zoekt nodes, see check indexing status. If namespaces are not being indexed due to low disk space, consider adding or scaling up nodes.

Security and authentication

Zoekt implements a multi-layered authentication system to secure communication between GitLab, Zoekt indexer, and Zoekt webserver components. Authentication is enforced across all communication channels.

All authentication methods use the GitLab Shell secret. Failed authentication attempts return 401 Unauthorized responses.

Zoekt indexer to GitLab

The Zoekt indexer authenticates to GitLab with JSON web tokens (JWT) to retrieve indexing tasks and send completion callbacks.

This method uses .gitlab_shell_secret for signing and verification. Tokens are sent in the Gitlab-Shell-Api-Request header. Endpoints include:

  • GET /internal/search/zoekt/:uuid/heartbeat for task retrieval
  • POST /internal/search/zoekt/:uuid/callback for status updates

This method ensures secure polling for task distribution and status reporting between Zoekt indexer nodes and GitLab.

GitLab to the Zoekt webserver

JWT authentication

GitLab authenticates to the Zoekt webserver with JSON web tokens (JWT) to execute search queries. JWT tokens provide time-limited, cryptographically signed authentication consistent with other GitLab authentication patterns.

This method uses Gitlab::Shell.secret_token and the HS256 algorithm (HMAC with SHA-256). Tokens are sent in the Authorization: Bearer <jwt_token> header and expire in five minutes to limit exposure.

Endpoints include /webserver/api/search and /webserver/api/v2/search. JWT claims are the issuer (gitlab) and the audience (gitlab-zoekt).

Basic authentication

GitLab authenticates to the Zoekt webserver with HTTP basic authentication through NGINX to execute search queries. Basic authentication is used primarily in GitLab Helm chart and Kubernetes deployments.

This method uses the username and password configured in Kubernetes secrets. Endpoints include /webserver/api/search and /webserver/api/v2/search on the Zoekt webserver.