- System design
This document describes how
kas routes requests to concrete
GitLab must talk to GitLab Kubernetes Agent Server (
- Get information about connected agents. Read more.
- Interact with agents. Read more.
- Interact with Kubernetes clusters. Read more.
Each agent connects to an instance of
kas and keeps an open connection. When
GitLab must talk to a particular agent, a
kas instance connected to this agent must
be found, and the request routed to it.
For an architecture overview please see architecture.md.
For this architecture, this diagram shows a request to
agentk 3, Pod1 for the list of pods:
kas instance tracks the agents connected to it in Redis. For each agent, it
stores a serialized protobuf object with information about the agent. When an agent
kas removes all corresponding information from Redis. For both events,
kas publishes a notification to a Redis pub-sub channel.
Each agent, while logically a single entity, can have multiple replicas (multiple pods)
in a cluster.
kas accommodates that and records per-replica (generally per-connection)
information. Each open
GetConfiguration() streaming request is given
a unique identifier which, combined with agent ID, identifies an
gRPC can keep multiple TCP connections open for a single target host.
GetConfiguration() streaming request.
kas uses that connection, and
doesn’t see idle TCP connections because they are handled by the gRPC framework.
kas instance provides information to Redis, so other
kas instances can discover and access it.
Information is stored in Redis with an expiration time,
to expire information for
kas instances that become unavailable. To prevent
information from expiring too quickly,
kas periodically updates the expiration time
for valid entries. Before terminating,
kas cleans up the information it adds into Redis.
kas must atomically update multiple data structures in Redis, it uses
transactions to ensure data consistency.
Grouped data items must have the same expiration time.
In addition to the existing
agentk -> kas gRPC endpoint,
kas exposes two new,
separate gRPC endpoints for GitLab and for
kas -> kas requests. Each endpoint
is a separate network listener, making it easier to control network access to endpoints
and allowing separate configuration for each endpoint.
Databases, like PostgreSQL, aren’t used because the data is transient, with no need to reliably persist it.
GitLab authenticates with
kas using JWT and the same shared secret used by the
kas -> GitLab communication. The JWT issuer should be
gitlab and the audience
When accessed through this endpoint,
kas plays the role of request router.
If a request from GitLab comes but no connected agent can handle it,
and waits for a suitable agent to connect to it or to another
kas instance. It
stops waiting when the client disconnects, or when some long timeout happens, such
as client timeout.
kas is notified of new agent connections through a
pub-sub channel to avoid frequent polling.
When a suitable agent connects,
kas routes the request to it.
This endpoint is an implementation detail, an internal API, and should not be used
by any other system. It’s protected by JWT using a secret, shared among all
instances. No other system must have access to this secret.
When accessed through this endpoint,
kas uses the request itself to determine
agentk to send the request to. It prevents request cycles by only following
the instructions in the request, rather than doing discovery. It’s the responsibility
kas receiving the request from the external endpoint to retry and re-route
requests. This method ensures a single central component for each request can determine
how a request is routed, rather than distributing the decision across several
This section explains how the
kas reverse gRPC tunnel is implemented.
For a video overview of how some of the blocks map to code, see GitLab Kubernetes Agent reverse gRPC tunnel architecture and code overview .
In this example,
Server side of module A exposes its API to get the
Public API gRPC server. When it receives a request, it must determine
the agent ID from it, then call the proxying code which forwards the request to
agentk that can handle it.
Agent side of module A exposes the same API on the
Internal gRPC server.
When it receives the request, it needs to handle it (such as retrieving and returning
This schema describes how reverse tunneling is handled fully transparently for modules, so you can add new features:
HandleTunnelConnection() is called with the server-side interface of the reverse
tunnel. It registers the connection and blocks, waiting for a request to proxy
through the connection.
HandleIncomingConnection() is called with the server-side interface of the incoming
connection. It registers the connection and blocks, waiting for a matching tunnel
to proxy the connection through.
After it has two connections that match,
Connection registry starts bi-directional