Geo proxying

Secondaries proxy nearly all HTTP requests through Workhorse to the primary, so users navigating to the secondary see a read-write UI, and are able to do all operations that they can do on the primary.

High-level components

Proxying of GitLab UI and API HTTP requests is handled by the gitlab-workhorse component. Traffic usually sent to the Rails application on the Geo secondary site is proxied to the internal URL of the primary Geo site instead.

Proxying of Git over HTTP requests is handled by the gitlab-workhorse component, but the decision to proxy or not is handled by the Rails application, taking into account whether the request is push or pull, and whether the desired Git data is up-to-date.

Proxying of Git over SSH traffic is handled by the gitlab-shell component, but the decision to proxy or not is handled by the Rails application, taking into account whether the request is push or pull, and whether the desired Git data is up-to-date.

Request lifecycle

Top-level view

The proxying interaction can be explained at a high level through the following diagram:

primarysecondaryprimarysecondaryclientGET /exploreGET /explore (proxied)HTTP/1.1 200 OK [..]HTTP/1.1 200 OK [..]client

Proxy detection mechanism

To know whether or not it should proxy requests to the primary, and the URL of the primary (as it is stored in the database), Workhorse polls the internal API when Geo is enabled. When proxying should be enabled, the internal API responds with the primary URL and JWT-signed data that is passed on to the primary for every request.

Internal Rails APIWorkhorse (secondary)Internal Rails APIWorkhorse (secondary)loop[Poll every 10 seconds]GET /api/v4/geo/proxy (internal){geo_proxy_primary_url, geo_proxy_extra_data}, update config

In-depth request flow and local data acceleration compared with proxying

Detailing implementation, Workhorse on the secondary (requested) site decides whether to proxy the data or not. If it can “accelerate” the data type (that is, can serve locally to save a roundtrip request), it returns the data immediately. Otherwise, traffic is sent to the primary’s internal URL, served by Workhorse on the primary exactly as a direct request would. The response is then be proxied back to the user through the secondary Workhorse in the same connection.

Yes
No (proxy)
Client
Workhorse (secondary)
Serve data locally?
Workhorse (primary)

Sign-in

Requests proxied to the primary requiring authorization

PrimarySecondaryClientPrimarySecondaryClientauthentication happens, POST to same URL etcopt[primary not signed in]`/group/project` request1proxy /group/project2302 redirect3proxy 302 redirect4/users/sign_in5proxy /users/sign_in6302 redirect7proxy 302 redirect8/group/project9proxy /group/project10/group/project logged in response (session on primary created)11proxy full response12

Git pull

For historical reasons, the push_from_secondary path is used to forward a Git pull. There is an issue proposing to rename this route to avoid confusion.

Git pull over HTTP(s)

Accelerated repositories

When a repository exists on the secondary and we detect is up to date with the primary, we serve it directly instead of proxying.

"Gitaly (secondary)""Rails (secondary)""Workhorse (secondary)"Git client"Gitaly (secondary)""Rails (secondary)""Workhorse (secondary)"Git clientdecide that the repo is synced and up to dateGET /foo/bar.git/info/refs/?service=git-upload-pack<internal API check>401 Unauthorized<response>GET /foo/bar.git/info/refs/?service=git-upload-pack<internal API check>Render Workhorse OK200 OKPOST /foo/bar.git/git-upload-packGitHttpControllerRender Workhorse OKWorkhorse gets the connection details from Rails, connects to Gitaly: SmartHTTP Service, UploadPack RPC (check the proto for details)Return a stream of Proto messagesPipe messages to the Git client

Proxied repositories

If a requested repository isn’t synced, or we detect is not up to date, the request will be proxied to the primary, in order to get the latest version of the changes.

"Gitaly (primary)""Rails (primary)""Workhorse (primary)""Rails (secondary)""Workhorse (secondary)"Git client"Gitaly (primary)""Rails (primary)""Workhorse (primary)""Rails (secondary)""Workhorse (secondary)"Git clientdecide that the repo is out of dateproxiedGET /foo/bar.git/info/refs/?service=git-upload-pack<response>302 Redirect to /-/push_from_secondary/2/foo/bar.git/info/refs?service=git-upload-pack<response>GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-upload-pack<proxied request><data>401 Unauthorized<proxied response><response>GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-upload-pack<proxied request><data>Render Workhorse OK<proxied response><response>POST /-/push_from_secondary/2/foo/bar.git/git-upload-pack<proxied request>GitHttpControllerRender Workhorse OKWorkhorse gets the connection details from Rails, connects to Gitaly: SmartHTTP Service, UploadPack RPC (check the proto for details)Return a stream of Proto messagesPipe messages to the Git clientReturn piped messages from Git

Git pull over SSH

As SSH operations go through GitLab Shell instead of Workhorse, they are not proxied through the mechanism used for Workhorse requests. With SSH operations, they are proxied as Git HTTP requests to the primary site by the secondary Rails internal API.

Accelerated repositories

When a repository exists on the secondary and we detect is up to date with the primary, we serve it directly instead of proxying.

Gitaly (secondary)Internal API (secondary Rails)GitLab Shell (secondary)Git clientGitaly (secondary)Internal API (secondary Rails)GitLab Shell (secondary)Git clientgit pullSSH key validation (api/v4/internal/authorized_keys?key=..)HTTP/1.1 200 OKInfoRefs:UploadPack RPCstream Git response backstream Git response backstream Git data to pushUploadPack RPCstream Git response backstream Git response back

Proxied repositories

If a requested repository isn’t synced, or we detect is not up to date, the request will be proxied to the primary, in order to get the latest version of the changes.

Primary APIInternal API (secondary Rails)GitLab Shell (secondary)Git clientPrimary APIInternal API (secondary Rails)GitLab Shell (secondary)Git clientgit pullSSH key validation (api/v4/internal/authorized_keys?key=..)HTTP/1.1 300 (custom action status) with {endpoint, msg, primary_repo}POST /api/v4/geo/proxy_git_ssh/info_refs_upload_packPOST $PRIMARY/foo/bar.git/info/refs/?service=git-upload-packHTTP/1.1 200 OK<response>return Git response from primarystream Git data to pushPOST /api/v4/geo/proxy_git_ssh/upload_packPOST $PRIMARY/foo/bar.git/git-upload-packHTTP/1.1 200 OK<response>return Git response from primary

Git push

Git push over SSH

As SSH operations go through GitLab Shell instead of Workhorse, they are not proxied through the mechanism used for Workhorse requests. With SSH operations, they are proxied as Git HTTP requests to the primary site by the secondary Rails internal API.

Primary APIInternal API (secondary Rails)GitLab Shell (secondary)Git clientPrimary APIInternal API (secondary Rails)GitLab Shell (secondary)Git clientgit pushSSH key validation (api/v4/internal/authorized_keys?key=..)HTTP/1.1 300 (custom action status) with {endpoint, msg, primary_repo}POST /api/v4/geo/proxy_git_ssh/info_refs_receive_packPOST $PRIMARY/foo/bar.git/info/refs/?service=git-receive-packHTTP/1.1 200 OK<response>return Git response from primarystream Git data to pushPOST /api/v4/geo/proxy_git_ssh/receive_packPOST $PRIMARY/foo/bar.git/git-receive-packHTTP/1.1 200 OK<response>return Git response from primary

Git push over HTTP(S)

If a requested repository isn’t synced, or we detect is not up to date, the request will be proxied to the primary, a push redirects to a local path formatted as /-/push_from_secondary/$SECONDARY_ID/*. Further, requests through this path are proxied to the primary, which will handle the push.

Gitaly (primary)Rails (primary)Workhorse (primary)Workhorse (secondary)Git clientGitaly (primary)Rails (primary)Workhorse (primary)Workhorse (secondary)Git clientGET /foo/bar.git/info/refs/?service=git-receive-pack302 Redirect to /-/push_from_secondary/2/foo/bar.git/info/refs?service=git-receive-packGET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-receive-pack<proxied request><data>401 Unauthorized<proxied response><response>GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-receive-pack<proxied request><data>Render Workhorse OK<proxied response><response>POST /-/push_from_secondary/2/foo/bar.git/git-receive-pack<proxied request>GitHttpController:git_receive_packRender Workhorse OKGet connection details from Rails and connects to SmartHTTP Service, ReceivePack RPCReturn a stream of Proto messagesPipe messages to the Git clientReturn piped messages from Git