Database load balancing
With database load balancing, read-only queries can be distributed across multiple PostgreSQL nodes to increase performance.
This documentation provides a technical overview on how database load balancing is implemented in GitLab Rails and Sidekiq.
Nomenclature
- Host: Each database host. It could be a primary or a replica.
- Primary: Primary PostgreSQL host that is used for write-only and read-and-write operations.
- Replica: Secondary PostgreSQL hosts that are used for read-only operations.
- Workload: a Rails request or a Sidekiq job that requires database connections.
Components
A few Ruby classes are involved in the load balancing process. All of them are
in the namespace Gitlab::Database::LoadBalancing
:
Host
LoadBalancer
ConnectionProxy
Session
Each workload begins with a new instance of Gitlab::Database::LoadBalancing::Session
.
The Session
keeps track of the database operations that have been performed. It then
determines if the workload requires a connection to either the primary host or a replica host.
When the workload requires a database connection through ActiveRecord
,
ConnectionProxy
first redirects the connection request to LoadBalancer
.
ConnectionProxy
requests either a read
or read_write
connection from the LoadBalancer
depending on a few criteria:
- Whether the query is a read-only or it requires write.
- Whether the
Session
has recorded a write operation previously. - Whether any special blocks have been used to prefer primary or replica, such as:
use_primary
ignore_writes
use_replicas_for_read_queries
fallback_to_replicas_for_ambiguous_queries
LoadBalancer
then yields the requested connection from the respective database connection pool.
It yields either:
- A
read_write
connection from the primary’s connection pool. - A
read
connection from the replicas’ connection pools.
When responding to a request for a read
connection, LoadBalancer
would
first attempt to load balance the connection across the replica hosts.
It looks for the next online
replica host and yields a connection from the host’s connection pool.
A replica host is considered online
if it is up-to-date with the primary, based on
either the replication lag size or time. The thresholds for these requirements are configurable.