Queue routing rules

When the number of Sidekiq jobs increases to a certain scale, the system faces some scalability issues. One of them is that the length of the queue tends to get longer. High-urgency jobs have to wait longer until other less urgent jobs finish. This head-of-line blocking situation may eventually affect the responsiveness of the system, especially critical actions. In another scenario, the performance of some jobs is degraded due to other long running or CPU-intensive jobs (computing or rendering ones) in the same machine.

To counter the aforementioned issues, one effective solution is to split Sidekiq jobs into different queues and assign machines handling each queue exclusively. For example, all CPU-intensive jobs could be routed to the cpu-bound queue and handled by a fleet of CPU optimized instances. The queue topology differs between companies depending on the workloads and usage patterns. Therefore, GitLab supports a flexible mechanism for the administrator to route the jobs based on their characteristics.

As an alternative to Queue selector, which configures Sidekiq cluster to listen to a specific set of workers or queues, GitLab also supports routing a job from a worker to the desired queue when it is scheduled. Sidekiq clients try to match a job against a configured list of routing rules. Rules are evaluated from first to last, and as soon as we find a match for a given worker we stop processing for that worker (first match wins). If the worker doesn’t match any rule, it falls back to the queue name generated from the worker name.

By default, if the routing rules are not configured (or denoted with an empty array), all the jobs are routed to the queue generated from the worker name.

Example configuration

In /etc/gitlab/gitlab.rb:

sidekiq['routing_rules'] = [
  # Do not re-route workers that require their own queue
  ['tags=needs_own_queue', nil],
  # Route all non-CPU-bound workers that are high urgency to `high-urgency` queue
  ['resource_boundary!=cpu&urgency=high', 'high-urgency'],
  # Route all database, gitaly and global search workers that are throttled to `throttled` queue
  ['feature_category=database,gitaly,global_search&urgency=throttled', 'throttled'],
  # Route all workers having contact with outside work to a `network-intenstive` queue
  ['has_external_dependencies=true|feature_category=hooks|tags=network', 'network-intensive'],
  # Route all import workers to the queues generated by the worker name, for
  # example, JiraImportWorker to `jira_import`, SVNWorker to `svn_worker`
  ['feature_category=import', nil],
  # Wildcard matching, route the rest to `default` queue
  ['*', 'default']
]

The routing rules list is an order-matter array of tuples of query and corresponding queue:

  • The query is following a worker matching query syntax.
  • The <queue_name> must be a valid Sidekiq queue name. If the queue name is nil, or an empty string, the worker is routed to the queue generated by the name of the worker instead.

The query supports wildcard matching *, which matches all workers. As a result, the wildcard query must stay at the end of the list or the rules after it are ignored.

noteMixing queue routing rules and queue selectors requires care to ensure all jobs that are scheduled and picked up by appropriate Sidekiq workers.

Worker matching query

GitLab provides a simple query syntax to match a worker based on its attributes. This query syntax is employed by both Queue routing rules and Queue selector. A query includes two components:

  • Attributes that can be selected.
  • Operators used to construct a query.

Available attributes

Introduced in GitLab 13.1 (tags).

Queue matching query works upon the worker attributes, described in Sidekiq style guide. We support querying based on a subset of worker attributes:

  • feature_category - the GitLab feature category the queue belongs to. For example, the merge queue belongs to the source_code_management category.
  • has_external_dependencies - whether or not the queue connects to external services. For example, all importers have this set to true.
  • urgency - how important it is that this queue’s jobs run quickly. Can be high, low, or throttled. For example, the authorized_projects queue is used to refresh user permissions, and is high urgency.
  • worker_name - the worker name. The other attributes are typically more useful as they are more general, but this is available in case a particular worker needs to be selected.
  • name - the queue name generated from the worker name. The other attributes are typically more useful as they are more general, but this is available in case a particular queue needs to be selected. Because this is generated from the worker name, it does not change based on the result of other routing rules.
  • resource_boundary - if the queue is bound by cpu, memory, or unknown. For example, the ProjectExportWorker is memory bound as it has to load data in memory before saving it for export.
  • tags - short-lived annotations for queues. These are expected to frequently change from release to release, and may be removed entirely.

has_external_dependencies is a boolean attribute: only the exact string true is considered true, and everything else is considered false.

tags is a set, which means that = checks for intersecting sets, and != checks for disjoint sets. For example, tags=a,b selects queues that have tags a, b, or both. tags!=a,b selects queues that have neither of those tags.

The attributes of each worker are hard-coded in the source code. For convenience, we generate a list of all available attributes in GitLab Community Edition and a list of all available attributes in GitLab Enterprise Edition.

Available operators

queue_selector supports the following operators, listed from highest to lowest precedence:

  • | - the logical OR operator. For example, query_a|query_b (where query_a and query_b are queries made up of the other operators here) will include queues that match either query.
  • & - the logical AND operator. For example, query_a&query_b (where query_a and query_b are queries made up of the other operators here) will only include queues that match both queries.
  • != - the NOT IN operator. For example, feature_category!=issue_tracking excludes all queues from the issue_tracking feature category.
  • = - the IN operator. For example, resource_boundary=cpu includes all queues that are CPU bound.
  • , - the concatenate set operator. For example, feature_category=continuous_integration,pages includes all queues from either the continuous_integration category or the pages category. This example is also possible using the OR operator, but allows greater brevity, as well as being lower precedence.

The operator precedence for this syntax is fixed: it’s not possible to make AND have higher precedence than OR.

In GitLab 12.9 and later, as with the standard queue group syntax above, a single * as the entire queue group selects all queues.

Migration

After the Sidekiq routing rules are changed, administrators need to take care with the migration to avoid losing jobs entirely, especially in a system with long queues of jobs. The migration can be done by following the migration steps mentioned in Sidekiq job migration

Workers that cannot be migrated

Some workers cannot share a queue with other workers - typically because they check the size of their own queue - and so must be excluded from this process. We recommend excluding these from any further worker routing by adding a rule to keep them in their own queue, for example:

sidekiq['routing_rules'] = [
  ['tags=needs_own_queue', nil],
  # ...
]

These queues will also need to be included in at least one Sidekiq queue group.

The following table shows the workers that should have their own queue:

Worker name Queue name GitLab issue
EmailReceiverWorker email_receiver gitlab-com/gl-infra/scalability#1263
ServiceDeskEmailReceiverWorker service_desk_email_receiver gitlab-com/gl-infra/scalability#1263
ProjectImportScheduleWorker project_import_schedule gitlab-org/gitlab#340630
HashedStorage::MigratorWorker hashed_storage:hashed_storage_migrator gitlab-org/gitlab#340629
HashedStorage::ProjectMigrateWorker hashed_storage:hashed_storage_project_migrate gitlab-org/gitlab#340629
HashedStorage::ProjectRollbackWorker hashed_storage:hashed_storage_project_rollback gitlab-org/gitlab#340629
HashedStorage::RollbackerWorker hashed_storage:hashed_storage_rollbacker gitlab-org/gitlab#340629