- Install using Docker
- Install using the AI gateway Helm chart
- Upgrade the AI gateway Docker image
- Alternative installation methods
- Health Check and Debugging
- Does the AIGW need to autoscale?
- Configuration Examples by Deployment Size
- What specs does the AIGW container have access to, and how does resource allocation affect performance?
- Mitigation Strategies for Resource Contention
- Scaling Strategies
- Scaling Recommendations
Install the GitLab AI gateway
The AI gateway is a standalone service that gives access to AI-powered GitLab Duo features.
Install using Docker
Prerequisites:
- Install a Docker container engine, such as Docker.
- Use a valid hostname accessible within your network. Do not use
localhost
.
The GitLab AI gateway Docker image contains all necessary code and dependencies in a single container.
The Docker image for the AI gateway is around 340 MB (compressed) for the linux/amd64
architecture and requires a minimum of 512 MB of RAM to operate. A GPU is not needed for the GitLab AI gateway. To ensure better performance, especially under heavy usage, consider allocating more disk space, memory, and resources than the minimum requirements. Higher RAM and disk capacity can enhance the AI gateway’s efficiency during peak loads.
Find the AI gateway release
Find the GitLab official Docker image at:
- AI gateway Docker image on Container Registry.
- AI gateway Docker image on DockerHub.
- Release process for self-hosted AI gateway.
Use the image tag that corresponds to your GitLab version. For example, if the
GitLab version is v17.5.0
, use self-hosted-v17.5.0-ee
tag.
Start a container from the image
-
For Docker images with version
self-hosted-17.4.0-ee
and later, run the following:docker run -p 5052:5052 \ -e AIGW_GITLAB_URL=<your_gitlab_instance> \ -e AIGW_GITLAB_API_URL=https://<your_gitlab_domain>/api/v4/ \ <image>
From the container host, accessing
http://localhost:5052/docs
should open the AI gateway API documentation. -
Ensure that port
5052
is forwarded to the container from the host and is included in theAI_GATEWAY_URL
environment variable.
If you encounter issues loading the PEM file, resulting in errors like JWKError
, you may need to resolve an SSL certificate error.
To fix this, set the appropriate certificate bundle path in the Docker container by using the following environment variables:
SSL_CERT_FILE=/path/to/ca-bundle.pem
REQUESTS_CA_BUNDLE=/path/to/ca-bundle.pem
Replace /path/to/ca-bundle.pem
with the actual path to your certificate bundle.
Additional Configuration
If you encounter authentication issues during health checks, bypass the authentication temporarily by setting the following environment variable:
-e AIGW_AUTH__BYPASS_EXTERNAL=true
This can be helpful for troubleshooting, but you should disable this after fixing the issues.
Install using the AI gateway Helm chart
Prerequisites:
- You must have a:
- Domain you own, that you can add a DNS record to.
- Kubernetes cluster.
- Working installation of
kubectl
. - Working installation of Helm, version v3.11.0 or later.
For more information, see Test the GitLab chart on GKE or EKS.
Add the AI gateway Helm repository
Add the AI gateway Helm repository to Helm’s configuration:
helm repo add ai-gateway \
https://gitlab.com/api/v4/projects/gitlab-org%2fcharts%2fai-gateway-helm-chart/packages/helm/devel
Install the AI gateway
-
Create the
ai-gateway
namespace:kubectl create namespace ai-gateway
- Generate the certificate for the domain where you plan to expose the AI gateway.
-
Create the TLS secret in the previously created namespace:
kubectl -n ai-gateway create secret tls ai-gateway-tls --cert="<path_to_cert>" --key="<path_to_cert_key>"
-
For the AI gateway to access the API, it must know where the GitLab instance is located. To do this, set the
gitlab.url
andgitlab.apiUrl
together with theingress.hosts
andingress.tls
values as follows:helm repo add ai-gateway \ https://gitlab.com/api/v4/projects/gitlab-org%2fcharts%2fai-gateway-helm-chart/packages/helm/devel helm repo update helm upgrade --install ai-gateway \ ai-gateway/ai-gateway \ --version 0.1.1 \ --namespace=ai-gateway \ --set="image.tag=<ai-gateway-image>" \ --set="gitlab.url=https://<your_gitlab_domain>" \ --set="gitlab.apiUrl=https://<your_gitlab_domain>/api/v4/" \ --set "ingress.enabled=true" \ --set "ingress.hosts[0].host=<your_gateway_domain>" \ --set "ingress.hosts[0].paths[0].path=/" \ --set "ingress.hosts[0].paths[0].pathType=ImplementationSpecific" \ --set "ingress.tls[0].secretName=ai-gateway-tls" \ --set "ingress.tls[0].hosts[0]=<your_gateway_domain>" \ --set="ingress.className=nginx" \ --timeout=300s --wait --wait-for-jobs
This step can take will take a few seconds in order for all resources to be allocated and the AI gateway to start.
You might need to set up your own Ingress Controller for the AI gateway if your existing nginx
Ingress controller does not serve services in a different namespace. Make sure Ingress is set up correctly for multi-namespace deployments.
For versions of the ai-gateway
Helm chart, use helm search repo ai-gateway --versions
to find the appropriate chart version.
Wait for your pods to get up and running:
kubectl wait pod \
--all \
--for=condition=Ready \
--namespace=ai-gateway \
--timeout=300s
When your pods are up and running, you can set up your IP ingresses and DNS records.
Upgrade the AI gateway Docker image
To upgrade the AI gateway, download the newest Docker image tag.
-
Stop the running container:
sudo docker stop gitlab-aigw
-
Remove the existing container:
sudo docker rm gitlab-aigw
-
Pull and run the new image.
-
Ensure that the environment variables are all set correctly.
Alternative installation methods
For information on alternative ways to install the AI gateway, see issue 463773.
Health Check and Debugging
To debug issues with your self-hosted Duo installation, run the following command:
sudo gitlab-rake gitlab:duo:verify_self_hosted_setup
Ensure that:
- The environment variable
AI_GATEWAY_URL
is correctly set. - Duo access has been explicitly enabled for the root user through
/admin/code_suggestions
.
If access issues persist, check that authentication is correctly configured, and that the health check passes.
In case of persistent issues, the error message may suggest bypassing authentication with AIGW_AUTH__BYPASS_EXTERNAL=true
, but only do this for troubleshooting.
You can also run a health check by going to Admin > GitLab Duo.
These tests are performed for offline environments:
Test | Description |
---|---|
Network | Tests whether: - The environment variable AI_GATEWAY_URL has been set to a valid URL.- Your instance can connect to the URL specified by AI_GATEWAY_URL .If your instance cannot connect to the URL, ensure that your firewall or proxy server settings allow connection. |
License | Tests whether your license has the ability to access Code Suggestions feature. |
System exchange | Tests whether Code Suggestions can be used in your instance. If the system exchange assessment fails, users might not be able to use GitLab Duo features. |
Does the AIGW need to autoscale?
Autoscaling is not mandatory but is recommended for environments with variable workloads, high concurrency requirements, or unpredictable usage patterns. In GitLab’s production environment:
- Baseline Setup: A single AI Gateway instance with 2 CPU cores and 8 GB RAM can handle approximately 40 concurrent requests.
- Scaling Guidelines: For larger setups, such as an AWS t3.2xlarge instance (8 vCPUs, 32 GB RAM), the gateway can handle up to 160 concurrent requests, equivalent to 4x the baseline setup.
- Request Throughput: GitLab.com’s observed usage suggests that 7 RPS (requests per second) per 1000 active users is a reasonable metric for planning.
- Autoscaling Options: Use Kubernetes Horizontal Pod Autoscalers (HPA) or similar mechanisms to dynamically adjust the number of instances based on metrics like CPU, memory utilization, or request latency thresholds.
Configuration Examples by Deployment Size
- Small Deployment:
- Single instance with 2 vCPUs and 8 GB RAM.
- Handles up to 40 concurrent requests.
- Teams or organizations with up to 50 users and predictable workloads.
- Fixed instances may suffice; autoscaling can be disabled for cost efficiency.
- Medium Deployment:
- Single AWS t3.2xlarge instance with 8 vCPUs and 32 GB RAM.
- Handles up to 160 concurrent requests.
- Organizations with 50-200 users and moderate concurrency requirements.
- Implement Kubernetes HPA with thresholds for 50% CPU utilization or request latency above 500ms.
- Large Deployment:
- Cluster of multiple AWS t3.2xlarge instances or equivalent.
- Each instance handles 160 concurrent requests, scaling to thousands of users with multiple instances.
- Enterprises with over 200 users and variable, high-concurrency workloads.
- Use HPA to scale pods based on real-time demand, combined with node autoscaling for cluster-wide resource adjustments.
What specs does the AIGW container have access to, and how does resource allocation affect performance?
The AI Gateway operates effectively under the following resource allocations:
- 2 CPU cores and 8 GB of RAM per container.
- Containers typically utilize about 7.39% CPU and proportionate memory in GitLab’s production environment, leaving room for growth or handling burst activity.
Mitigation Strategies for Resource Contention
- Use Kubernetes resource requests and limits to ensure AIGW containers receive guaranteed CPU and memory allocations. For example:
resources:
requests:
memory: "16Gi"
cpu: "4"
limits:
memory: "32Gi"
cpu: "8"
- Implement tools like Prometheus and Grafana to track resource utilization (CPU, memory, latency) and detect bottlenecks early.
- Dedicate nodes or instances exclusively to the AI Gateway to prevent resource competition with other services.
Scaling Strategies
- Use Kubernetes HPA to scale pods based on real-time metrics like:
- Average CPU utilization exceeding 50%.
- Request latency consistently above 500ms.
- Enable node autoscaling to scale infrastructure resources dynamically as pods increase.
Scaling Recommendations
Deployment Size | Instance Type | Resources | Capacity (Concurrent Requests) | Scaling Recommendations |
---|---|---|---|---|
Small | 2 vCPUs, 8 GB RAM | Single instance | 40 | Fixed deployment; no autoscaling. |
Medium | AWS t3.2xlarge | Single instance | 160 | HPA based on CPU or latency thresholds. |
Large | Multiple t3.2xlarge | Clustered instances | 160 per instance | HPA + node autoscaling for high demand. |