Supported GitLab Duo Self-Hosted models and hardware requirements
- Tier: Premium, Ultimate
- Add-on: GitLab Duo Enterprise
- Offering: GitLab Self-Managed
GitLab Duo Self-Hosted supports integration with industry-leading models from Mistral, Meta, Anthropic, and OpenAI through your preferred serving platform.
You can choose from these supported models to match your specific performance needs and use cases.
In GitLab 18.3 and later, you can also use your own compatible model, giving you the flexibility to experiment with additional language models beyond the officially supported options.
Supported models
GitLab-supported models offer different levels of functionality for GitLab Duo features, depending on the specific model and feature combination.
- Full functionality: The model can likely handle the feature without any loss of quality.
- Partial functionality: The model supports the feature, but there might be compromises or limitations.
- Limited functionality: The model is unsuitable for the feature, likely resulting in significant quality loss or performance issues. Models that have limited functionality for a feature will not receive GitLab support for that specific feature.
Model family | Model | Supported platforms | Code completion | Code generation | GitLab Duo Chat | GitLab Duo Agent Platform |
---|---|---|---|---|---|---|
Mistral Codestral | Codestral 22B v0.1 | vLLM | Full functionality | Full functionality | Partial functionality | Limited functionality |
Mistral | Mistral Small 24B Instruct 2506 | vLLM | Full functionality | Full functionality | Full functionality | Limited functionality |
Claude 3 | Claude 3.5 Sonnet | AWS Bedrock | Full functionality | Full functionality | Full functionality | Partial functionality |
Claude 3 | Claude 3.7 Sonnet | AWS Bedrock | Full functionality | Full functionality | Full functionality | Partial functionality |
Claude 4 | Claude 4 Sonnet | AWS Bedrock | Full functionality | Full functionality | Full functionality | Full functionality |
GPT | GPT-4 Turbo | Azure OpenAI | Full functionality | Full functionality | Partial functionality | Limited functionality |
GPT | GPT-4o | Azure OpenAI | Full functionality | Full functionality | Full functionality | Limited functionality |
GPT | GPT-4o-mini | Azure OpenAI | Full functionality | Full functionality | Partial functionality | Limited functionality |
GPT | GPT-5) | Azure OpenAI | Full functionality | Full functionality | Full functionality | Limited functionality |
GPT | GPT-oss-120B | vLLM | Full functionality | Full functionality | Full functionality | Limited functionality |
GPT | GPT-oss-20B | vLLM | Partial functionality | Partial functionality | Partial functionality | Limited functionality |
Llama | Llama 3 8B | vLLM | Partial functionality | Full functionality | Limited functionality | Limited functionality |
Llama | Llama 3.1 8B | vLLM | Partial functionality | Full functionality | Partial functionality | Limited functionality |
Llama | Llama 3 70B | vLLM | Partial functionality | Full functionality | Limited functionality | Limited functionality |
Llama | Llama 3.1 70B | vLLM | Full functionality | Full functionality | Full functionality | Limited functionality |
Llama | Llama 3.3 70B | vLLM | Full functionality | Full functionality | Full functionality | Limited functionality |
Compatible models
- Status: Beta
You can use your own compatible models and platform with GitLab Duo features. For compatible models not included in supported model families, use the general model family.
Compatible models are excluded from the definition of Customer Integrated Models in the AI Functionality Terms. Compatible models and platforms must adhere to the OpenAI API specification. Models and platforms that have previously been marked as experimental or beta are now considered compatible models.
This feature is in beta and is therefore subject to change as we gather feedback and improve the integration:
- GitLab does not provide technical support for issues specific to your chosen model or platform.
- Not all GitLab Duo features are guaranteed to work optimally with every compatible model.
- Response quality, speed, and performance overall might vary significantly based on your model choice.
Model family | Model requirements | Supported platforms |
---|---|---|
General | Any model compatible with the OpenAI API specification | Any platform that provides OpenAI-compatible API endpoints |
CodeGemma | CodeGemma 2b | vLLM |
CodeGemma | CodeGemma 7b-it | vLLM |
CodeGemma | CodeGemma 7b-code | vLLM |
Code Llama | Code-Llama 13b | vLLM |
DeepSeek Coder | DeepSeek Coder 33b Instruct | vLLM |
DeepSeek Coder | DeepSeek Coder 33b Base | vLLM |
Mistral | Mistral 7B-it v0.2 | vLLM AWS Bedrock |
Mistral | Mistral 7B-it v0.3 1 | vLLM |
Mistral | Mixtral 8x7B-it v0.1 1 | vLLM, AWS Bedrock |
Mistral | Mixtral 8x22B-it v0.1 1 | vLLM |
Footnotes:
- Support for this model was removed in GitLab 18.5. You should use Mistral Small 24B Instruct 2506 instead.
GitLab AI vendor models
- Status: Beta
The availability of this feature is controlled by a feature flag. For more information, see the history.
GitLab AI vendor models integrate with GitLab-hosted AI gateway infrastructure to provide access to AI models curated and made available by GitLab. Instead of using your own self-hosted models, you can choose to use GitLab AI vendor models for specific GitLab Duo features.
To choose which features use GitLab AI vendor models, see Configure GitLab AI vendor models.
When enabled for a specific feature:
- All calls to those features configured with a GitLab AI vendor model use the GitLab-hosted AI gateway, not the self-hosted AI gateway.
- No detailed logs are generated in the GitLab-hosted AI gateway, even when AI logs are enabled. This prevents unintended leaks of sensitive information.
Hardware requirements
The following hardware specifications are the minimum requirements for running GitLab Duo Self-Hosted on-premise. Requirements vary significantly based on the model size and intended usage:
Base system requirements
- CPU:
- Minimum: 8 cores (16 threads)
- Recommended: 16+ cores for production environments
- RAM:
- Minimum: 32 GB
- Recommended: 64 GB for most models
- Storage:
- SSD with sufficient space for model weights and data.
GPU requirements by model size
Model size | Minimum GPU configuration | Minimum VRAM required |
---|---|---|
7B models (for example, Mistral 7B) | 1x NVIDIA A100 (40 GB) | 35 GB |
22B models (for example, Codestral 22B) | 2x NVIDIA A100 (80 GB) | 110 GB |
Mixtral 8x7B | 2x NVIDIA A100 (80 GB) | 220 GB |
Mixtral 8x22B | 8x NVIDIA A100 (80 GB) | 526 GB |
Use Hugging Face’s memory utility to verify memory requirements.
Response time by model size and GPU
Small machine
With a a2-highgpu-2g
(2x Nvidia A100 40 GB - 150 GB vRAM) or equivalent:
Model name | Number of requests | Average time per request (sec) | Average tokens in response | Average tokens per second per request | Total time for requests | Total TPS |
---|---|---|---|---|---|---|
Mistral-7B-Instruct-v0.3 | 1 | 7.09 | 717.0 | 101.19 | 7.09 | 101.17 |
Mistral-7B-Instruct-v0.3 | 10 | 8.41 | 764.2 | 90.35 | 13.70 | 557.80 |
Mistral-7B-Instruct-v0.3 | 100 | 13.97 | 693.23 | 49.17 | 20.81 | 3331.59 |
Medium machine
With a a2-ultragpu-4g
(4x Nvidia A100 40 GB - 340 GB vRAM) machine on GCP or equivalent:
Model name | Number of requests | Average time per request (sec) | Average tokens in response | Average tokens per second per request | Total time for requests | Total TPS |
---|---|---|---|---|---|---|
Mistral-7B-Instruct-v0.3 | 1 | 3.80 | 499.0 | 131.25 | 3.80 | 131.23 |
Mistral-7B-Instruct-v0.3 | 10 | 6.00 | 740.6 | 122.85 | 8.19 | 904.22 |
Mistral-7B-Instruct-v0.3 | 100 | 11.71 | 695.71 | 59.06 | 15.54 | 4477.34 |
Mixtral-8x7B-Instruct-v0.1 | 1 | 6.50 | 400.0 | 61.55 | 6.50 | 61.53 |
Mixtral-8x7B-Instruct-v0.1 | 10 | 16.58 | 768.9 | 40.33 | 32.56 | 236.13 |
Mixtral-8x7B-Instruct-v0.1 | 100 | 25.90 | 767.38 | 26.87 | 55.57 | 1380.68 |
Large machine
With a a2-ultragpu-8g
(8 x NVIDIA A100 80 GB - 1360 GB vRAM) machine on GCP or equivalent:
Model name | Number of requests | Average time per request (sec) | Average tokens in response | Average tokens per second per request | Total time for requests (sec) | Total TPS |
---|---|---|---|---|---|---|
Mistral-7B-Instruct-v0.3 | 1 | 3.23 | 479.0 | 148.41 | 3.22 | 148.36 |
Mistral-7B-Instruct-v0.3 | 10 | 4.95 | 678.3 | 135.98 | 6.85 | 989.11 |
Mistral-7B-Instruct-v0.3 | 100 | 10.14 | 713.27 | 69.63 | 13.96 | 5108.75 |
Mixtral-8x7B-Instruct-v0.1 | 1 | 6.08 | 709.0 | 116.69 | 6.07 | 116.64 |
Mixtral-8x7B-Instruct-v0.1 | 10 | 9.95 | 645.0 | 63.68 | 13.40 | 481.06 |
Mixtral-8x7B-Instruct-v0.1 | 100 | 13.83 | 585.01 | 41.80 | 20.38 | 2869.12 |
Mixtral-8x22B-Instruct-v0.1 | 1 | 14.39 | 828.0 | 57.56 | 14.38 | 57.55 |
Mixtral-8x22B-Instruct-v0.1 | 10 | 20.57 | 629.7 | 30.24 | 28.02 | 224.71 |
Mixtral-8x22B-Instruct-v0.1 | 100 | 27.58 | 592.49 | 21.34 | 36.80 | 1609.85 |
AI Gateway Hardware Requirements
For recommendations on AI gateway hardware, see the AI gateway scaling recommendations.