Database load balancing
With database load balancing, read-only queries can be distributed across multiple PostgreSQL nodes to increase performance.
This documentation provides a technical overview on how database load balancing is implemented in GitLab Rails and Sidekiq.
Nomenclature
- Host: Each database host. It could be a primary or a replica.
- Primary: Primary PostgreSQL host that is used for write-only and read-and-write operations.
- Replica: Secondary PostgreSQL hosts that are used for read-only operations.
- Workload: a Rails request or a Sidekiq job that requires database connections.
Components
A few Ruby classes are involved in the load balancing process. All of them are
in the namespace Gitlab::Database::LoadBalancing
:
Host
LoadBalancer
ConnectionProxy
Session
Each workload begins with a new instance of Gitlab::Database::LoadBalancing::Session
.
The Session
keeps track of the database operations that have been performed. It then
determines if the workload requires a connection to either the primary host or a replica host.
When the workload requires a database connection through ActiveRecord
,
ConnectionProxy
first redirects the connection request to LoadBalancer
.
ConnectionProxy
requests either a read
or read_write
connection from the LoadBalancer
depending on a few criteria:
- Whether the query is a read-only or it requires write.
- Whether the
Session
has recorded a write operation previously. - Whether any special blocks have been used to prefer primary or replica, such as:
use_primary
ignore_writes
use_replicas_for_read_queries
fallback_to_replicas_for_ambiguous_queries
LoadBalancer
then yields the requested connection from the respective database connection pool.
It yields either:
- A
read_write
connection from the primary’s connection pool. - A
read
connection from the replicas’ connection pools.
When responding to a request for a read
connection, LoadBalancer
would
first attempt to load balance the connection across the replica hosts.
It looks for the next online
replica host and yields a connection from the host’s connection pool.
A replica host is considered online
if it is up-to-date with the primary, based on
either the replication lag size or time. The thresholds for these requirements are configurable.
Docs
Edit this page to fix an error or add an improvement in a merge request.
Create an issue to suggest an improvement to this page.
Product
Create an issue if there's something you don't like about this feature.
Propose functionality by submitting a feature request.
Feature availability and product trials
View pricing to see all GitLab tiers and features, or to upgrade.
Try GitLab for free with access to all features for 30 days.
Get help
If you didn't find what you were looking for, search the docs.
If you want help with something specific and could use community support, post on the GitLab forum.
For problems setting up or using this feature (depending on your GitLab subscription).
Request support