Supported GitLab Duo Self-Hosted models and hardware requirements
Offering: GitLab Self-Managed
Status: Beta
-
Introduced in GitLab 17.1 with a flag named
ai_custom_model
. Disabled by default. - Enabled on GitLab Self-Managed in GitLab 17.6.
- Changed to require GitLab Duo add-on in GitLab 17.6 and later.
- Feature flag
ai_custom_model
removed in GitLab 17.8
The following table shows the supported models along with their specific features and hardware requirements to help you select the model that best fits your infrastructure needs for optimal performance.
Approved LLMs
Install one of the following GitLab-approved large language models (LLMs):
Model family | Model | Supported platforms | Status | Code completion | Code generation | GitLab Duo Chat |
---|---|---|---|---|---|---|
Mistral Codestral | Codestral 22B v0.1 | vLLM | Generally available | 🟢 Green | 🟢 Green | N/A |
Mistral | Mistral 7B-it v0.3 | vLLM | Generally available | 🟢 Green | 🟢 Green | 🔴 Red |
Mistral | Mixtral 8x7B-it v0.1 |
vLLM AWS Bedrock | Generally available | 🟢 Green | 🟢 Green | 🟡 Amber |
Mistral | Mixtral 8x22B-it v0.1 | vLLM | Generally available | 🟢 Green | 🟢 Green | 🟢 Green |
Claude 3 | Claude 3.5 Sonnet | AWS Bedrock | Generally available | 🟢 Green | 🟢 Green | 🟢 Green |
GPT | GPT-4 Turbo | Azure OpenAI | Generally available | 🟢 Green | 🟢 Green | 🟡 Amber |
GPT | GPT-4o | Azure OpenAI | Generally available | 🟢 Green | 🟢 Green | 🟢 Green |
GPT | GPT-4o-mini | Azure OpenAI | Generally available | 🟢 Green | 🟢 Green | 🟡 Amber |
Legend:
- 🟢 Green - Strongly recommended. The model can handle the feature without any loss of quality.
- 🟡 Amber - Recommended. The model supports the feature, but there might be minor compromises or limitations.
- 🔴 Red - Not recommended. The model is unsuitable for the feature, likely resulting in significant quality loss or performance issues.
The following models are under evaluation, and support is limited:
Model family | Model | Supported platforms | Status | Code completion | Code generation | GitLab Duo Chat |
---|---|---|---|---|---|---|
CodeGemma | CodeGemma 2b | vLLM | Beta | Yes | No | No |
CodeGemma | CodeGemma 7b-it | vLLM | Beta | No | Yes | No |
CodeGemma | CodeGemma 7b-code | vLLM | Beta | Yes | No | No |
Code Llama | Code-Llama 13b | vLLM | Beta | No | Yes | No |
DeepSeek Coder | DeepSeek Coder 33b Instruct | vLLM | Beta | Yes | Yes | No |
DeepSeek Coder | DeepSeek Coder 33b Base | vLLM | Beta | Yes | No | No |
Mistral | Mistral 7B-it v0.2 |
vLLM AWS Bedrock | Beta | Yes | Yes | Yes |
Hardware requirements
The following hardware specifications are the minimum requirements for running GitLab Duo Self-Hosted on-premise. Requirements vary significantly based on the model size and intended usage:
Base system requirements
-
CPU:
- Minimum: 8 cores (16 threads)
- Recommended: 16+ cores for production environments
-
RAM:
- Minimum: 32 GB
- Recommended: 64 GB for most models
-
Storage:
- SSD with sufficient space for model weights and data.
GPU requirements by model size
Model size | Minimum GPU configuration | Minimum VRAM required |
---|---|---|
7B models (for example, Mistral 7B) | 1x NVIDIA A100 (40GB) | 35 GB |
22B models (for example, Codestral 22B) | 2x NVIDIA A100 (80GB) | 110 GB |
Mixtral 8x7B | 2x NVIDIA A100 (80GB) | 220 GB |
Mixtral 8x22B | 8x NVIDIA A100 (80GB) | 526 GB |
Use Hugging Face’s memory utility to verify memory requirements.
Response time by model size and GPU
Small machine
With a a2-highgpu-2g
(2x Nvidia A100 40 GB - 150 GB vRAM) or equivalent:
Model name | Number of requests | Average time per request (sec) | Average tokens in response | Average tokens per second per request | Total time for requests | Total TPS |
---|---|---|---|---|---|---|
Mistral-7B-Instruct-v0.3 | 1 | 7.09 | 717.0 | 101.19 | 7.09 | 101.17 |
Mistral-7B-Instruct-v0.3 | 10 | 8.41 | 764.2 | 90.35 | 13.70 | 557.80 |
Mistral-7B-Instruct-v0.3 | 100 | 13.97 | 693.23 | 49.17 | 20.81 | 3331.59 |
Medium machine
With a a2-ultragpu-4g
(4x Nvidia A100 40 GB - 340 GB vRAM) machine on GCP or equivalent:
Model name | Number of requests | Average time per request (sec) | Average tokens in response | Average tokens per second per request | Total time for requests | Total TPS |
---|---|---|---|---|---|---|
Mistral-7B-Instruct-v0.3 | 1 | 3.80 | 499.0 | 131.25 | 3.80 | 131.23 |
Mistral-7B-Instruct-v0.3 | 10 | 6.00 | 740.6 | 122.85 | 8.19 | 904.22 |
Mistral-7B-Instruct-v0.3 | 100 | 11.71 | 695.71 | 59.06 | 15.54 | 4477.34 |
Mixtral-8x7B-Instruct-v0.1 | 1 | 6.50 | 400.0 | 61.55 | 6.50 | 61.53 |
Mixtral-8x7B-Instruct-v0.1 | 10 | 16.58 | 768.9 | 40.33 | 32.56 | 236.13 |
Mixtral-8x7B-Instruct-v0.1 | 100 | 25.90 | 767.38 | 26.87 | 55.57 | 1380.68 |
Large machine
With a a2-ultragpu-8g
(8 x NVIDIA A100 80 GB - 1360 GB vRAM) machine on GCP or equivalent:
Model name | Number of requests | Average time per request (sec) | Average tokens in response | Average tokens per second per request | Total time for requests (sec) | Total TPS |
---|---|---|---|---|---|---|
Mistral-7B-Instruct-v0.3 | 1 | 3.23 | 479.0 | 148.41 | 3.22 | 148.36 |
Mistral-7B-Instruct-v0.3 | 10 | 4.95 | 678.3 | 135.98 | 6.85 | 989.11 |
Mistral-7B-Instruct-v0.3 | 100 | 10.14 | 713.27 | 69.63 | 13.96 | 5108.75 |
Mixtral-8x7B-Instruct-v0.1 | 1 | 6.08 | 709.0 | 116.69 | 6.07 | 116.64 |
Mixtral-8x7B-Instruct-v0.1 | 10 | 9.95 | 645.0 | 63.68 | 13.40 | 481.06 |
Mixtral-8x7B-Instruct-v0.1 | 100 | 13.83 | 585.01 | 41.80 | 20.38 | 2869.12 |
Mixtral-8x22B-Instruct-v0.1 | 1 | 14.39 | 828.0 | 57.56 | 14.38 | 57.55 |
Mixtral-8x22B-Instruct-v0.1 | 10 | 20.57 | 629.7 | 30.24 | 28.02 | 224.71 |
Mixtral-8x22B-Instruct-v0.1 | 100 | 27.58 | 592.49 | 21.34 | 36.80 | 1609.85 |
AI Gateway Hardware Requirements
For recommendations on AI gateway hardware, see the AI gateway scaling recommendations.