Monitoring Gitaly and Gitaly Cluster

You can use the available logs and Prometheus metrics to monitor Gitaly and Gitaly Cluster (Praefect).

Metric definitions are available:

Directly from Prometheus /metrics endpoint configured for Gitaly.
Using Grafana Explore on a Grafana instance configured against Prometheus.

Monitor Gitaly rate limiting (deprecated)

This feature was deprecated in GitLab 17.7 and is planned for removal in 18.0. Use concurrency limiting instead.

Gitaly can be configured to limit requests based on:

Concurrency of requests.
A rate limit.

Monitor Gitaly concurrency limiting

You can observe specific behavior of concurrency-queued requests using Gitaly logs and Prometheus.

In the Gitaly logs, you can identify logs related to the pack-objects concurrency limiting with entries such as:

Log Field	Description
`limit.concurrency_queue_length`	Indicates the current length of the queue specific to the RPC type of the ongoing call. It provides insight into the number of requests waiting to be processed due to concurrency limits.
`limit.concurrency_queue_ms`	Represents the duration, in milliseconds, that a request has spent waiting in the queue due to the limit on concurrent RPCs. This field helps understand the impact of concurrency limits on request processing times.
`limit.concurrency_dropped`	If the request is dropped due to limits being reached, this field specifies the reason: either `max_time` (request waited in the queue longer than the maximum allowed time) or `max_size` (the queue reached its maximum size).
`limit.limiting_key`	Identifies the key used for limiting.
`limit.limiting_type`	Specifies the type of process being limited. In this context, it’s `per-rpc`, indicating that the concurrency limiting is applied on a per-RPC basis.

For example:

 JSON Copy to clipboard  
{
  "limit .concurrency_queue_length": 1,
  "limit .concurrency_queue_ms": 0,
  "limit.limiting_key": "@hashed/79/02/7902699be42c8a8e46fbbb450172651786b22c56a189f7625a6da49081b2451.git",
  "limit.limiting_type": "per-rpc"
}

In Prometheus, look for the following metrics:

gitaly_concurrency_limiting_in_progress indicates how many concurrent requests are being processed.
gitaly_concurrency_limiting_queued indicates how many requests for an RPC for a given repository are waiting due to the concurrency limit being reached.
gitaly_concurrency_limiting_acquiring_seconds indicates how long a request has to wait due to concurrency limits before being processed.
gitaly_requests_dropped_total provides a total count of requests dropped due to request limiting. The reason label indicates why a request was dropped:
- max_size, because the concurrency queue size was reached.
- max_time, because the request exceeded the maximum queue wait time as configured in Gitaly.

Monitor Gitaly pack-objects concurrency limiting

You can observe specific behavior of pack-objects limiting using Gitaly logs and Prometheus.

In the Gitaly logs, you can identify logs related to the pack-objects concurrency limiting with entries such as:

Log Field	Description
`limit.concurrency_queue_length`	Current length of the queue for the pack-objects processes. Indicates the number of requests that are waiting to be processed because the limit on concurrent processes has been reached.
`limit.concurrency_queue_ms`	Time a request has spent waiting in the queue, in milliseconds. Indicates how long a request has had to wait because of the limits on concurrency.
`limit.limiting_key`	Remote IP of the sender.
`limit.limiting_type`	Type of process being limited. In this case, `pack-objects`.

Example configuration:

 JSON Copy to clipboard  
{
  "limit .concurrency_queue_length": 1,
  "limit .concurrency_queue_ms": 0,
  "limit.limiting_key": "1.2.3.4",
  "limit.limiting_type": "pack-objects"
}

In Prometheus, look for the following metrics:

gitaly_pack_objects_in_progress indicates how many pack-objects processes are being processed concurrently.
gitaly_pack_objects_queued indicates how many requests for pack-objects processes are waiting due to the concurrency limit being reached.
gitaly_pack_objects_acquiring_seconds indicates how long a request for a pack-object process has to wait due to concurrency limits before being processed.

Monitor Gitaly adaptive concurrency limiting

History

You can observe specific behavior of adaptive concurrency limiting using Gitaly logs and Prometheus.

Adaptive concurrency limiting is an extension of static concurrency limiting, so all metrics and logs applicable to static concurrency limiting are also relevant when monitoring adaptive limits. In addition, adaptive limiting introduces several specific metrics that help monitor the dynamic adjustment of limits.

Adaptive limiting logs

In the Gitaly logs, you can identify logs related to the adaptive concurrency limiting when the current limits are adjusted. You can filter the content of the logs (msg) for “Multiplicative decrease” and “Additive increase” messages.

These debug logs are only available at debug severity level and can be verbose, but they provide detailed insights into adaptive limit adjustments.

Log Field	Description
`limit`	The name of the limit being adjusted.
`previous_limit`	The previous limit before it was increased or decreased.
`new_limit`	The new limit after it was increased or decreased.
`watcher`	The resource watcher that decided the node is under pressure. For example: `CgroupCpu` or `CgroupMemory`.
`reason`	The reason behind limit adjustment.
`stats.*`	Some statistics behind an adjustment decision. They are for debugging purposes.

Example log:

 JSON Copy to clipboard  
{
  "msg": "Multiplicative decrease",
  "limit": "pack-objects",
  "new_limit": 14,
  "previous_limit": 29,
  "reason": "cgroup CPU throttled too much",
  "watcher": "CgroupCpu",
  "stats.time_diff": 15.0,
  "stats.throttled_duration": 13.0,
  "stat.sthrottled_threshold": 0.5
}

Adaptive limiting metrics

In Prometheus, look for the following metrics:

General concurrency limiting metrics, applicable to both static and adaptive limits:

gitaly_concurrency_limiting_in_progress - Number of requests being processed.
gitaly_concurrency_limiting_queued - Number of requests waiting in the queue due to concurrency limits.
gitaly_concurrency_limiting_acquiring_seconds - Time spent by requests waiting due to concurrency limits before processing begins.

Adaptive concurrency limiting specific metrics:

gitaly_concurrency_limiting_current_limit - A gauge showing the current limit value of an adaptive concurrency limit for each RPC type. Only adaptive limits are included in this metric.
gitaly_concurrency_limiting_backoff_events_total - Counter indicating the total number of backoff events, representing when and why limits are reduced due to resource pressure.
gitaly_concurrency_limiting_watcher_errors_total - Counter tracking errors that occur when Gitaly fails to retrieve resource data, which may impact the ability for Gitaly to evaluate the current resource situation.

When investigating issues with adaptive limiting, correlate these metrics with the general concurrency limiting metrics and logs to get a complete picture of system behavior.

Monitor Gitaly cgroups

You can observe the status of control groups (cgroups) using Prometheus:

gitaly_cgroups_reclaim_attempts_total, a gauge for the total number of times there has been a memory reclaim attempt. This number resets each time a server is restarted.
gitaly_cgroups_cpu_usage, a gauge that measures CPU usage per cgroup.
gitaly_cgroup_procs_total, a gauge that measures the total number of processes Gitaly has spawned under the control of cgroups.
gitaly_cgroup_cpu_cfs_periods_total, a counter that for the value of nr_periods.
gitaly_cgroup_cpu_cfs_throttled_periods_total, a counter for the value of nr_throttled.
gitaly_cgroup_cpu_cfs_throttled_seconds_total, a counter for the value of throttled_time in seconds.

`pack-objects` cache

The following pack-objects cache metrics are available:

gitaly_pack_objects_cache_enabled, a gauge set to 1 when the cache is enabled. Available labels: dir and max_age.
gitaly_pack_objects_cache_lookups_total, a counter for cache lookups. Available label: result.
gitaly_pack_objects_generated_bytes_total, a counter for the number of bytes written into the cache.
gitaly_pack_objects_served_bytes_total, a counter for the number of bytes read from the cache.
gitaly_streamcache_filestore_disk_usage_bytes, a gauge for the total size of cache files. Available label: dir.
gitaly_streamcache_index_entries, a gauge for the number of entries in the cache. Available label: dir.

Some of these metrics start with gitaly_streamcache because they are generated by the streamcache internal library package in Gitaly.

Example:

  Copy to clipboard  
gitaly_pack_objects_cache_enabled{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache",max_age="300"} 1
gitaly_pack_objects_cache_lookups_total{result="hit"} 2
gitaly_pack_objects_cache_lookups_total{result="miss"} 1
gitaly_pack_objects_generated_bytes_total 2.618649e+07
gitaly_pack_objects_served_bytes_total 7.855947e+07
gitaly_streamcache_filestore_disk_usage_bytes{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 2.6200152e+07
gitaly_streamcache_filestore_removed_total{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1
gitaly_streamcache_index_entries{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1

Monitor Gitaly server-side backups

History

Monitor server-side repository backups with the following metrics:

gitaly_backup_latency_seconds, a histogram measuring the amount of time in seconds that each phase of a server-side backup takes. The different phases are refs, bundle, and custom_hooks and represent the type of data being processed at each stage.
gitaly_backup_bundle_bytes, a histogram measuring the upload data rate of Git bundles being pushed to object storage by the Gitaly backup service.

Use these metrics especially if your GitLab instance contains large repositories.

Queries

The following are some queries for monitoring Gitaly:

Use the following Prometheus query to observe the type of connections Gitaly is serving a production environment:
prometheus Copy to clipboard
```
sum(rate(gitaly_connections_total[5m])) by (type)
```
Use the following Prometheus query to monitor the authentication behavior of your GitLab installation:
prometheus Copy to clipboard
```
sum(rate(gitaly_authentications_total[5m])) by (enforced, status)
```
In a system where authentication is configured correctly and where you have live traffic, you see something like this:
prometheus Copy to clipboard
```
{enforced="true",status="ok"}  4424.985419441742
```
There may also be other numbers with rate 0, but you only have to take note of the non-zero numbers.
The only non-zero number should have enforced="true",status="ok". If you have other non-zero numbers, something is wrong in your configuration.
The status="ok" number reflects your current request rate. In the example above, Gitaly is handling about 4000 requests per second.
Use the following Prometheus query to observe the Git protocol versions being used in a production environment:
prometheus Copy to clipboard
```
sum(rate(gitaly_git_protocol_requests_total[1m])) by (grpc_method,git_protocol,grpc_service)
```

Monitor Gitaly Cluster

To monitor Gitaly Cluster (Praefect), you can use these Prometheus metrics. Two separate metrics endpoints are available from which metrics can be scraped:

The default /metrics endpoint.
/db_metrics, which contains metrics that require database queries.

Default Prometheus `/metrics` endpoint

The following metrics are available from the /metrics endpoint:

gitaly_praefect_read_distribution, a counter to track distribution of reads. It has two labels:
- virtual_storage.
- storage.
They reflect configuration defined for this instance of Praefect.
gitaly_praefect_replication_latency_bucket, a histogram measuring the amount of time it takes for replication to complete after the replication job starts.
gitaly_praefect_replication_delay_bucket, a histogram measuring how much time passes between when the replication job is created and when it starts.
gitaly_praefect_connections_total, the total number of connections to Praefect.
gitaly_praefect_method_types, a count of accessor and mutator RPCs per node.

To monitor strong consistency, you can use the following Prometheus metrics:

gitaly_praefect_transactions_total, the number of transactions created and voted on.
gitaly_praefect_subtransactions_per_transaction_total, the number of times nodes cast a vote for a single transaction. This can happen multiple times if multiple references are getting updated in a single transaction.
gitaly_praefect_voters_per_transaction_total: the number of Gitaly nodes taking part in a transaction.
gitaly_praefect_transactions_delay_seconds, the server-side delay introduced by waiting for the transaction to be committed.
gitaly_hook_transaction_voting_delay_seconds, the client-side delay introduced by waiting for the transaction to be committed.

To monitor repository verification, use the following Prometheus metrics:

gitaly_praefect_verification_jobs_dequeued_total, the number of verification jobs picked up by the worker.
gitaly_praefect_verification_jobs_completed_total, the number of verification jobs completed by the worker. The result label indicates the end result of the jobs:
- valid indicates the expected replica existed on the storage.
- invalid indicates the replica expected to exist did not exist on the storage.
- error indicates the job failed and has to be retried.
gitaly_praefect_stale_verification_leases_released_total, the number of stale verification leases released.

You can also monitor the Praefect logs.

Database metrics `/db_metrics` endpoint

The following metrics are available from the /db_metrics endpoint:

gitaly_praefect_unavailable_repositories, the number of repositories that have no healthy, up to date replicas.
gitaly_praefect_replication_queue_depth, the number of jobs in the replication queue.
gitaly_praefect_verification_queue_depth, the total number of replicas pending verification.
gitaly_praefect_read_only_repositories, the number of repositories in read-only mode in a virtual storage.
- This metric was removed in GitLab 15.4.

Docs

Edit this page to fix an error or add an improvement in a merge request.

Create an issue to suggest an improvement to this page.

Product

Create an issue if there's something you don't like about this feature.

Propose functionality by submitting a feature request.

Feature availability and product trials

View pricing to see all GitLab tiers and features, or to upgrade.

Try GitLab for free with access to all features for 30 days.

Get help

If you didn't find what you were looking for, search the docs.

If you want help with something specific and could use community support, post on the GitLab forum.

For problems setting up or using this feature (depending on your GitLab subscription).

Request support

Monitoring Gitaly and Gitaly Cluster

Monitor Gitaly rate limiting (deprecated)

Monitor Gitaly concurrency limiting

Monitor Gitaly pack-objects concurrency limiting

Monitor Gitaly adaptive concurrency limiting

Adaptive limiting logs

Adaptive limiting metrics

Monitor Gitaly cgroups

pack-objects cache

Monitor Gitaly server-side backups

Queries

Monitor Gitaly Cluster

Default Prometheus /metrics endpoint

Database metrics /db_metrics endpoint

Help & feedback

Docs

Product

Feature availability and product trials

Get help

`pack-objects` cache

Default Prometheus `/metrics` endpoint

Database metrics `/db_metrics` endpoint