Zoekt

  • Tier: Premium, Ultimate
  • Offering: GitLab.com, GitLab Self-Managed
  • Status: Limited availability

This feature is in limited availability. For more information, see epic 9404. Provide feedback in issue 420920.

Zoekt is an open-source search engine designed specifically to search for code.

With this integration, you can use exact code search instead of advanced search to search for code in GitLab. You can use exact match and regular expression modes to search for code in a group or repository.

Zoekt handles only code search and does not replace Elasticsearch or OpenSearch. For all other search scopes, including comments, commits, epics, issues, merge requests, milestones, projects, users, and wikis, Elasticsearch or OpenSearch is still required.

Install Zoekt

Prerequisites:

  • Administrator access.

To enable exact code search in GitLab, you must have at least one Zoekt node connected to the instance. The following installation methods are supported for Zoekt:

The following installation methods are available for testing, not for production use:

From the GitLab UI

Prerequisites:

To enable exact code search from the GitLab UI:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. Select the Enable indexing and Enable searching checkboxes.
  5. Select Save changes.

With Rake tasks

Prerequisites:

You can manage exact code search with Rake tasks.

To enable indexing and search, run this Rake task:

gitlab-rake gitlab:zoekt:index

This task enables zoekt_indexing_enabled, zoekt_search_enabled, and zoekt_auto_index_root_namespace. RolloutWorker indexes all root namespaces automatically, and search becomes available when indices are ready.

To disable indexing and search, run this Rake task:

gitlab-rake gitlab:zoekt:disable

This task disables both zoekt_indexing_enabled and zoekt_search_enabled.

Pause and resume indexing

To pause indexing (for example, during maintenance), run this Rake task:

gitlab-rake gitlab:zoekt:pause_indexing

To resume indexing, run this Rake task:

gitlab-rake gitlab:zoekt:resume_indexing

Estimate storage requirements

To estimate the storage required for your Zoekt nodes, run this Rake task:

sudo gitlab-rake gitlab:zoekt:estimate_storage

For more information, see estimate storage requirements.

Check indexing status

Prerequisites:

  • Administrator access.

Indexing performance depends on the CPU and memory limits on the Zoekt indexer nodes. To check indexing status:

Run this Rake task:

gitlab-rake gitlab:zoekt:info

To have the data refresh automatically every 10 seconds, run this Rake task instead:

gitlab-rake "gitlab:zoekt:info[10]"

In a Rails console, run these commands:

Search::Zoekt::Index.group(:state).count
Search::Zoekt::Repository.group(:state).count
Search::Zoekt::Task.group(:state).count

Sample output

The gitlab:zoekt:info Rake task returns an output similar to the following:

Exact Code Search
GitLab version:                                      19.0.0
Enable indexing:                                     yes
Enable searching:                                    yes
Pause indexing:                                      no
Index root namespaces automatically:                 yes
Cache search results for five minutes:               yes
Indexing CPU to tasks multiplier:                    1.0
Probability of random force reindexing (percentage): 0.25
Number of parallel processes per indexing task:      1
Number of namespaces per indexing rollout:           32
Offline nodes automatically deleted after:           20m
Indexing timeout per project:                        30m
Maximum number of files per project to be indexed:   500000
Maximum file size for indexing:                      1MB
Maximum trigrams per file:                           20000
Retry interval for failed namespaces:                1d
Number of replicas per namespace:                    1
Maximum projects for legacy search:                  1000

Nodes
# Number of Zoekt nodes and their status
Node count:                   2 (online: 2, offline: 0)
Last seen at:                 2026-04-16 22:58:09 UTC (less than a minute ago)
Max schema_version:           2601
Storage reserved / usable:    71.1 MiB / 124 GiB (0.06%)
Storage indexed / reserved:   42.7 MiB / 71.1 MiB (60.0%)
Storage used / total:         797 GiB / 921 GiB (86.54%)
Online node watermark levels: 2
  - low: 2

Indexing status
Group count:                      8
# Number of enabled namespaces and their status
EnabledNamespace count:           8 (without indices: 0, rollout blocked: 0, with search disabled: 0)
Replicas count:                   8
  - ready: 8
Indices count:                    8
  - ready: 8
Indices watermark levels:         8
  - healthy: 8
Repositories count:               10
  - ready: 10
Tasks count:                      10
  - done: 10
Tasks pending/processing by type: (none)
Storage buffer factor:            0.831× [dynamic (observed)]

Feature Flags (Non-Default Values)
- zoekt_offset_pagination:      disabled

Feature Flags (Default Values)
- zoekt_batch_update_index_storage_bytes:  disabled
- zoekt_cap_file_match_results:            disabled

Node Details
Node 1 - test-zoekt-hostname-1:
  Status:                       Online
  Last seen at:                 2026-04-16 22:58:09 UTC (less than a minute ago)
  Disk utilization:             86.54%
  Unclaimed storage:            62 GiB
  # Zoekt build version on the node. Must match GitLab version.
  Zoekt version:                2026.04.15-v1.4.0-1-g89a8871
  Schema version:               2601
Node 2 - test-zoekt-hostname-2:
  Status:                       Online
  Last seen at:                 2026-04-16 22:58:09 UTC (less than a minute ago)
  Disk utilization:             86.54%
  Unclaimed storage:            62 GiB
  Zoekt version:                2026.04.15-v1.4.0-1-g89a8871
  Schema version:               2601

Run a health check

Prerequisites:

  • Administrator access.

Run a health check to understand the status of your Zoekt infrastructure, including:

  • Online and offline nodes
  • Indexing and search settings
  • Search API endpoints
  • JSON web token generation

To run a health check, execute the following task:

gitlab-rake gitlab:zoekt:health

This task provides:

  • The overall status: HEALTHY, DEGRADED, or UNHEALTHY
  • Recommendations for resolving detected issues
  • Exit codes for automation and monitoring integrations: 0=healthy, 1=degraded, or 2=unhealthy

Run checks automatically

To run health checks automatically every 10 seconds, execute the following task:

gitlab-rake "gitlab:zoekt:health[10]"

The output includes colored status indicators and shows:

  • Online and offline node counts, storage usage warnings, and connectivity issues
  • Core settings validation and namespace and repository indexing statuses
  • The overall status including a combined health assessment: HEALTHY, DEGRADED, or UNHEALTHY
  • Recommendations for resolving issues

Force reindex projects

Prerequisites:

  • Administrator access.

To force reindex a range of projects, run this Rake task:

gitlab-rake gitlab:zoekt:reindex_projects ID_FROM=10 ID_TO=20

ID_FROM and ID_TO represent the range of project IDs.

To force reindex only one project, use the same value for both ID_FROM and ID_TO. To force reindex all projects, do not use these environment variables.

Pause indexing

Prerequisites:

  • Administrator access.

To pause indexing for exact code search:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. Select the Pause indexing checkbox.
  5. Select Save changes.

When you pause indexing for exact code search, all changes in your repository are queued. To resume indexing, clear the Pause indexing for exact code search checkbox.

Index root namespaces automatically

Prerequisites:

  • Administrator access.

You can index both existing and new root namespaces automatically. To index all root namespaces automatically:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. Select the Index root namespaces automatically checkbox.
  5. Select Save changes.

When you enable this setting, GitLab creates indexing tasks for all projects in:

  • All groups and subgroups
  • Any new root namespace

After a project is indexed, GitLab creates only incremental indexing when a repository change is detected.

When you disable this setting:

  • Existing root namespaces remain indexed.
  • New root namespaces are no longer indexed.

Cache search results

Prerequisites:

  • Administrator access.

You can cache search results for better performance. This feature is enabled by default and caches results for five minutes.

To cache search results:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. Select the Cache search results for five minutes checkbox.
  5. Select Save changes.

Set concurrent indexing tasks

Prerequisites:

  • Administrator access.

You can set the number of concurrent indexing tasks for a Zoekt node relative to its CPU capacity.

A higher multiplier means more tasks can run concurrently, which would improve indexing throughput at the cost of increased CPU usage. The default value is 1.0 (one task per CPU core).

You can adjust this value based on the node’s performance and workload. To set the number of concurrent indexing tasks:

  1. In the upper-right corner, select Admin.

  2. In the left sidebar, select Settings > Search.

  3. Expand Exact code search.

  4. In the Indexing CPU to tasks multiplier text box, enter a value.

    For example, if a Zoekt node has 4 CPU cores and the multiplier is 1.5, the number of concurrent tasks for the node is 6.

  5. Select Save changes.

Define the probability of random force reindexing

Prerequisites:

  • Administrator access.

You can define the probability that a project is force reindexed instead of incrementally indexed. The default value is 0.25 (0.25%).

Force reindexing helps prevent memory map (mmap) handlers from running out by periodically rebuilding indices from scratch. A higher percentage increases indexing load, especially for very large repositories.

To define the probability of random force reindexing:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. In the Probability of random force reindexing (percentage) text box, enter a number between 0 and 100.
  5. Select Save changes.

Set the number of parallel processes per indexing task

Prerequisites:

  • Administrator access.

You can set the number of parallel processes per indexing task.

A higher number would improve indexing time at the cost of increased CPU and memory usage. The default value is 1 (one process per indexing task).

You can adjust this value based on the node’s performance and workload. To set the number of parallel processes per indexing task:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. In the Number of parallel processes per indexing task text box, enter a value.
  5. Select Save changes.

Set the number of namespaces per indexing rollout

Prerequisites:

  • Administrator access.

You can set the number of namespaces per RolloutWorker job for initial indexing. The default value is 32. You can adjust this value based on the node’s performance and workload.

To set the number of namespaces per indexing rollout:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. In the Number of namespaces per indexing rollout text box, enter a number greater than zero.
  5. Select Save changes.

Define when offline nodes are automatically deleted

Prerequisites:

  • Administrator access.

You can delete offline Zoekt nodes automatically after a specific period of time along with their related indices, repositories, and tasks. The default value is 12h (12 hours).

Use this setting to manage your Zoekt infrastructure and prevent orphaned resources. To define when offline nodes are automatically deleted:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. In the Offline nodes automatically deleted after text box, enter a value (for example, 30m (30 minutes), 2h (two hours), or 1d (one day)). To disable automatic deletion, set to 0.
  5. Select Save changes.

Define the indexing timeout for a project

Prerequisites:

  • Administrator access.

You can define the indexing timeout for a project. The default value is 30m (30 minutes).

To define the indexing timeout for a project:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. In the Indexing timeout per project text box, enter a value (for example, 30m (30 minutes), 2h (two hours), or 1d (one day)).
  5. Select Save changes.

Set the maximum number of files in a project to be indexed

Prerequisites:

  • Administrator access.

You can set the maximum number of files in a project that can be indexed. Projects with more files than this limit on the default branch are not indexed. The default value is 500,000.

You can adjust this value based on the node’s performance and workload. To set the maximum number of files in a project to be indexed:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. In the Maximum number of files per project to be indexed text box, enter a number greater than zero.
  5. Select Save changes.

Set maximum file size for indexing

Prerequisites:

  • Administrator access.

You can set the maximum size for a file to be indexed. The default value is 1MB.

For files that exceed the specified size, only filenames are indexed. You can search these files only by filename.

To set maximum file size for indexing:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. In the Maximum file size for indexing text box, enter a value (for example, 512B, 50KB, 2MB, or 1GB). The value can also be in lowercase.
  5. Select Save changes.

Set the maximum trigram count for indexing

Prerequisites:

  • Administrator access.

You can set the maximum number of trigrams for a file to be indexed. The default value is 20,000.

Trigrams are three-character sequences that Zoekt uses for efficient code search. For files that exceed this trigram limit, only filenames are indexed. A higher limit affects both indexing and search performance.

To set the maximum trigram count for indexing:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. In the Maximum trigrams per file text box, enter a number greater than zero.
  5. Select Save changes.

Define the retry interval for failed namespaces

Prerequisites:

  • Administrator access.

You can define the retry interval for namespaces that previously failed. The default value is 1d (one day). A value of 0 means failed namespaces never retry.

To define the retry interval for failed namespaces:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. In the Retry interval for failed namespaces text box, enter a value (for example, 30m (30 minutes), 2h (two hours), or 1d (one day)).
  5. Select Save changes.

Set the number of replicas per namespace

Prerequisites:

  • Administrator access.

You can set the number of replicas per namespace. The default value is 1 (one replica per namespace).

Increasing the number of replicas per namespace improves search availability by distributing the load across multiple Zoekt nodes. More replicas increase storage requirements.

To set the number of replicas per namespace:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Settings > Search.
  3. Expand Exact code search.
  4. In the Number of replicas per namespace text box, enter a number greater than zero.
  5. Select Save changes.

Run Zoekt on a separate server

Prerequisites:

  • Administrator access.

To run Zoekt on a different server than GitLab:

  1. Change the Gitaly listening interface.
  2. Install Zoekt.

Sizing recommendations

The following recommendations might be over-provisioned for some deployments. You should monitor your deployment to ensure:

  • No out-of-memory events occur.
  • CPU throttling is not excessive.
  • Indexing performance meets your requirements.

Adjust resources based on your specific workload characteristics, including:

  • Repository size and complexity
  • Number of active developers
  • Frequency of code changes
  • Indexing patterns

Memory architecture

The webserver and indexer have different memory usage patterns.

The webserver memory-maps index shards from disk into virtual memory. The operating system pages shard data in and out of physical memory as searches are served. Resident memory usage grows with the active working set. Nodes with larger indices or higher query volume require more webserver memory to avoid page thrashing and out-of-memory conditions.

When the indexer builds or rebuilds indices, the indexer processes Git object data in memory. Memory usage spikes when large repositories are indexed or multiple tasks run in parallel. You can control peak indexer memory by adjusting the number of parallel processes per indexing task and concurrent indexing tasks.

On VM and bare metal deployments, the webserver and indexer share the same system memory.

Nodes

For optimal performance, proper sizing of Zoekt nodes is crucial. Sizing recommendations differ between Kubernetes and VM deployments due to how resources are allocated and managed.

Kubernetes deployments

The following table shows recommended resources per node (per StatefulSet pod) for Kubernetes deployments based on index storage requirements. Each pod in the StatefulSet runs its own webserver and indexer containers with independent resource allocations and its own persistent volume for index storage. If you run multiple nodes, multiply these resources by the number of nodes to calculate total cluster resources.

DiskWebserver CPUWebserver memoryIndexer CPUIndexer memory
128 GB116 GiB16 GiB
256 GB1.532 GiB18 GiB
512 GB264 GiB112 GiB
1 TB3128 GiB1.524 GiB
2 TB4256 GiB232 GiB

To manage resources more granularly, you can allocate CPU and memory separately to different containers.

For Kubernetes deployments:

  • Do not set CPU limits for Zoekt containers. CPU limits might cause unnecessary throttling during indexing bursts, which would significantly impact performance. Instead, rely on resource requests to guarantee minimum CPU availability and ensure containers use additional CPU when available and needed.
  • Set appropriate memory limits to prevent resource contention and out-of-memory conditions.
  • Use high-performance storage classes for better indexing performance. GitLab.com uses pd-balanced on GCP, which balances performance and cost. Equivalent options include gp3 on AWS and Premium_LRS on Azure.

VM and bare metal deployments

The following table shows recommended resources per node for VM and bare metal deployments based on index storage requirements. If you run multiple nodes, multiply these resources by the number of nodes to calculate total cluster resources.

DiskVM sizeTotal CPUTotal memoryAWSGCPAzure
128 GBSmall2 cores16 GBr5.largen1-highmem-2Standard_E2s_v3
256 GBMedium4 cores32 GBr5.xlargen1-highmem-4Standard_E4s_v3
512 GBLarge4 cores64 GBr5.2xlargen1-highmem-8Standard_E8s_v3
1 TBX-Large8 cores128 GBr5.4xlargen1-highmem-16Standard_E16s_v3
2 TB2X-Large16 cores256 GBr5.8xlargen1-highmem-32Standard_E32s_v3

You can allocate these resources only to the entire node.

For VM and bare metal deployments:

  • Monitor CPU, memory, and disk usage to identify bottlenecks.
  • Consider using SSD storage for better indexing performance.
  • Ensure adequate network bandwidth for data transfer between GitLab and Zoekt nodes.

Storage

Zoekt storage requirements depend on the size of your Git repositories and your replica configuration. Zoekt indexes only Git object data (source code and commit history). It does not index LFS files, CI/CD artifacts, packages, wikis, or other storage components.

Estimate requirements

To estimate storage requirements, run this Rake task:

sudo gitlab-rake gitlab:zoekt:estimate_storage

This task queries your GitLab database and outputs a storage estimate based on your current repository sizes and replica configuration.

To calculate storage requirements manually, use these formulas instead:

storage_per_replica = sum(repository_git_size) × buffer_factor
total_cluster_storage = storage_per_replica × number_of_replicas

repository_git_size is the Git object size for each repository. This value does not include LFS objects, wikis, artifacts, or packages. buffer_factor is the headroom during initial indexing. You can calculate this value as Search::Zoekt::Index.global_buffer_factor, which is mostly 3 by default.

To view repository_git_size:

  1. In the upper-right corner, select Admin.
  2. In the left sidebar, select Overview > Projects.
  3. In the Repository column, view the Git object size.

For the initial provisioning target, start with three times your total repository_git_size multiplied by replica count. For example:

  • 100 GB of Git repository data and one replica: 300 GB of Zoekt storage.
  • 100 GB of Git repository data and two replicas: 600 GB of Zoekt storage.

GitLab reserves this buffer internally to ensure Zoekt has headroom during indexing. After initial indexing is complete, actual disk usage is typically closer to half the repository_git_size based on observed data on GitLab.com. Scale vertically or horizontally only when needed.

To view the current buffer factor, run this Rake task:

sudo gitlab-rake gitlab:zoekt:info

The output includes Storage buffer factor, which shows the dynamic value the planner is using.

To monitor Zoekt node storage, see check indexing status. If namespaces are not indexed due to low disk space, add nodes or increase disk capacity.

Security and authentication

Zoekt implements a multi-layered authentication system to secure communication between GitLab, Zoekt indexer, and Zoekt webserver components. Authentication is enforced across all communication channels.

All authentication methods use the GitLab Shell secret. Failed authentication attempts return 401 Unauthorized responses.

Zoekt indexer to GitLab

The Zoekt indexer authenticates to GitLab with JSON web tokens (JWT) to retrieve indexing tasks and send completion callbacks.

This method uses .gitlab_shell_secret for signing and verification. Tokens are sent in the Gitlab-Shell-Api-Request header. The following endpoints are available:

  • GET /internal/search/zoekt/:uuid/heartbeat for task retrieval
  • POST /internal/search/zoekt/:uuid/callback for status updates

This method ensures secure polling for task distribution and status reporting between Zoekt indexer nodes and GitLab.

GitLab to the Zoekt webserver

JWT authentication

GitLab authenticates to the Zoekt webserver with JSON web tokens (JWT) to execute search queries. JWT tokens provide time-limited, cryptographically signed authentication consistent with other GitLab authentication patterns.

This method uses Gitlab::Shell.secret_token and the HS256 algorithm (HMAC with SHA-256). Tokens are sent in the Authorization: Bearer <jwt_token> header and expire in five minutes to limit exposure.

Endpoints include /webserver/api/search and /webserver/api/v2/search. JWT claims are the issuer (gitlab) and the audience (gitlab-zoekt).