- 1. Definition
- 2. Data flow
- 3. Proposal
- 4. Evaluation
- 4.1. Pros
- 4.2. Cons
This document is a work-in-progress and represents a very early state of the Cells design. Significant aspects are not documented, though we expect to add them in the future. This is one possible architecture for Cells, and we intend to contrast this with alternatives before deciding which approach to implement. This documentation will be kept even if we decide not to implement this so that we can document the reasons for not choosing this approach.
GitLab Container Registry is a feature allowing to store Docker container images in GitLab.
GitLab Container Registry is a complex service requiring usage of PostgreSQL, Redis and Object Storage dependencies. Right now there’s undergoing work to introduce Container Registry Metadata to optimize data storage and image retention policies of Container Registry.
GitLab Container Registry is serving as a container for stored data, but on its own does not authenticate
docker login is executed with user credentials (can be
personal access token) or CI build credentials (ephemeral
Container Registry uses data deduplication.
It means that the same blob (image layer) that is shared between many Projects is stored only once.
Each layer is hashed by
docker login does request a JWT time-limited authentication token that is signed by GitLab, but validated by Container Registry service.
The JWT token does store all authorized scopes (
container repository images) and operation types (
A single JWT authentication token can have many authorized scopes.
This allows Container Registry and client to mount existing blobs from other scopes.
GitLab responds only with authorized scopes.
Then it is up to GitLab Container Registry to validate if the given operation can be performed.
The GitLab.com pages are always scoped to a Project. Each Project can have many container registry images attached.
Currently, on GitLab.com the actual registry service is served via
The main identifiable problems are:
- The authentication request (
https://gitlab.com/jwt/auth) that is processed by GitLab.com.
https://registry.gitlab.comthat is run by an external service and uses its own data store.
- Data deduplication. The Cells architecture with registry run in a Cell would reduce efficiency of data storage.
curl \ --user "username:password" \ "https://gitlab/jwt/auth?client_id=docker&offline_token=true&service=container_registry&scope=repository:gitlab-org/gitlab-build-images:push,pull"
Result is encoded and signed JWT token. Second base64 encoded string (split by
.) contains JSON with authorized scopes.
curl \ -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \ -H "Authorization: Bearer token" \ https://registry.gitlab.com/v2/gitlab-org/gitlab-build-images/tags/list curl \ -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \ -H "Authorization: Bearer token" \ https://registry.gitlab.com/v2/gitlab-org/gitlab-build-images/manifests/danger-ruby-2.6.6
curl \ -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \ -H "Authorization: Bearer token" \ https://registry.gitlab.com/v2/gitlab-org/gitlab-build-images/blobs/sha256:a3f2e1afa377d20897e08a85cae089393daa0ec019feab3851d592248674b416
Due to its extensive and in general highly scalable horizontal architecture it should be evaluated if the GitLab Container Registry should be run not in Cell, but in a Cluster and be scaled independently. This might be easier, but would definitely not offer the same amount of data isolation.
It appears that except
/jwt/auth which would likely have to be processed by Router (to decode
scope) the Container Registry could be run as a local service of a Cell.
The actual data at least in case of GitLab.com is not forwarded via registry, but rather served directly from Object Storage / CDN.
Its design encodes container repository image in a URL that is easily routable. It appears that we could re-use the same stateless Router service in front of Container Registry to serve manifests and blobs redirect.
The only downside is increased complexity of managing standalone registry for each Cell, but this might be desired approach.
There do not seem to be any theoretical problems with running GitLab Container Registry in a Cell. It seems that the service can be easily made routable to work well. The practical complexities are around managing a complex service from an infrastructure side.