Supported GitLab Duo Self-Hosted models and hardware requirements
- Tier: Ultimate with GitLab Duo Enterprise - Start a GitLab Duo Enterprise trial on a paid Ultimate subscription
- Offering: GitLab Self-Managed
The following table shows the supported models along with their specific features and hardware requirements to help you select the model that best fits your infrastructure needs for optimal performance.
Supported models
The following GitLab-supported large language models (LLMs) are generally available.
- Fully compatible: The model can likely handle the feature without any loss of quality.
- Largely compatible: The model supports the feature, but there might be compromises or limitations.
- Not compatible: The model is unsuitable for the feature, likely resulting in significant quality loss or performance issues.
Model Family | Model | Supported Platforms | Code completion | Code generation | GitLab Duo Chat |
---|---|---|---|---|---|
Mistral Codestral | Codestral 22B v0.1 | vLLM | Fully compatible | Fully compatible | N/A |
Mistral | Mistral 7B-it v0.3 | vLLM | Fully compatible | Fully compatible | Not compatible |
Mistral | Mixtral 8x7B-it v0.1 | vLLM, AWS Bedrock | Fully compatible | Fully compatible | Limited compatibility |
Mistral | Mixtral 8x22B-it v0.1 | vLLM | Fully compatible | Fully compatible | Limited compatibility |
Claude 3 | Claude 3.5 Sonnet | AWS Bedrock | Fully compatible | Fully compatible | Fully compatible |
GPT | GPT-4 Turbo | Azure OpenAI | Fully compatible | Fully compatible | Limited compatibility |
GPT | GPT-4o | Azure OpenAI | Fully compatible | Fully compatible | Fully compatible |
GPT | GPT-4o-mini | Azure OpenAI | Fully compatible | Fully compatible | Limited compatibility |
Experimental and beta models
The following models are configurable for the functionalities marked below, but are currently in experimental or beta status, under evaluation, and are excluded from the “Customer Integrated Models” definition in the AI Functionality Terms:
Model family | Model | Supported platforms | Status | Code completion | Code generation | GitLab Duo Chat |
---|---|---|---|---|---|---|
CodeGemma | CodeGemma 2b | vLLM | Beta | Yes | No | No |
CodeGemma | CodeGemma 7b-it | vLLM | Beta | No | Yes | No |
CodeGemma | CodeGemma 7b-code | vLLM | Beta | Yes | No | No |
Code Llama | Code-Llama 13b | vLLM | Beta | No | Yes | No |
DeepSeek Coder | DeepSeek Coder 33b Instruct | vLLM | Beta | Yes | Yes | No |
DeepSeek Coder | DeepSeek Coder 33b Base | vLLM | Beta | Yes | No | No |
Mistral | Mistral 7B-it v0.2 | vLLM AWS Bedrock | Beta | Yes | Yes | Yes |
Hardware requirements
The following hardware specifications are the minimum requirements for running GitLab Duo Self-Hosted on-premise. Requirements vary significantly based on the model size and intended usage:
Base system requirements
- CPU:
- Minimum: 8 cores (16 threads)
- Recommended: 16+ cores for production environments
- RAM:
- Minimum: 32 GB
- Recommended: 64 GB for most models
- Storage:
- SSD with sufficient space for model weights and data.
GPU requirements by model size
Model size | Minimum GPU configuration | Minimum VRAM required |
---|---|---|
7B models (for example, Mistral 7B) | 1x NVIDIA A100 (40GB) | 35 GB |
22B models (for example, Codestral 22B) | 2x NVIDIA A100 (80GB) | 110 GB |
Mixtral 8x7B | 2x NVIDIA A100 (80GB) | 220 GB |
Mixtral 8x22B | 8x NVIDIA A100 (80GB) | 526 GB |
Use Hugging Face’s memory utility to verify memory requirements.
Response time by model size and GPU
Small machine
With a a2-highgpu-2g
(2x Nvidia A100 40 GB - 150 GB vRAM) or equivalent:
Model name | Number of requests | Average time per request (sec) | Average tokens in response | Average tokens per second per request | Total time for requests | Total TPS |
---|---|---|---|---|---|---|
Mistral-7B-Instruct-v0.3 | 1 | 7.09 | 717.0 | 101.19 | 7.09 | 101.17 |
Mistral-7B-Instruct-v0.3 | 10 | 8.41 | 764.2 | 90.35 | 13.70 | 557.80 |
Mistral-7B-Instruct-v0.3 | 100 | 13.97 | 693.23 | 49.17 | 20.81 | 3331.59 |
Medium machine
With a a2-ultragpu-4g
(4x Nvidia A100 40 GB - 340 GB vRAM) machine on GCP or equivalent:
Model name | Number of requests | Average time per request (sec) | Average tokens in response | Average tokens per second per request | Total time for requests | Total TPS |
---|---|---|---|---|---|---|
Mistral-7B-Instruct-v0.3 | 1 | 3.80 | 499.0 | 131.25 | 3.80 | 131.23 |
Mistral-7B-Instruct-v0.3 | 10 | 6.00 | 740.6 | 122.85 | 8.19 | 904.22 |
Mistral-7B-Instruct-v0.3 | 100 | 11.71 | 695.71 | 59.06 | 15.54 | 4477.34 |
Mixtral-8x7B-Instruct-v0.1 | 1 | 6.50 | 400.0 | 61.55 | 6.50 | 61.53 |
Mixtral-8x7B-Instruct-v0.1 | 10 | 16.58 | 768.9 | 40.33 | 32.56 | 236.13 |
Mixtral-8x7B-Instruct-v0.1 | 100 | 25.90 | 767.38 | 26.87 | 55.57 | 1380.68 |
Large machine
With a a2-ultragpu-8g
(8 x NVIDIA A100 80 GB - 1360 GB vRAM) machine on GCP or equivalent:
Model name | Number of requests | Average time per request (sec) | Average tokens in response | Average tokens per second per request | Total time for requests (sec) | Total TPS |
---|---|---|---|---|---|---|
Mistral-7B-Instruct-v0.3 | 1 | 3.23 | 479.0 | 148.41 | 3.22 | 148.36 |
Mistral-7B-Instruct-v0.3 | 10 | 4.95 | 678.3 | 135.98 | 6.85 | 989.11 |
Mistral-7B-Instruct-v0.3 | 100 | 10.14 | 713.27 | 69.63 | 13.96 | 5108.75 |
Mixtral-8x7B-Instruct-v0.1 | 1 | 6.08 | 709.0 | 116.69 | 6.07 | 116.64 |
Mixtral-8x7B-Instruct-v0.1 | 10 | 9.95 | 645.0 | 63.68 | 13.40 | 481.06 |
Mixtral-8x7B-Instruct-v0.1 | 100 | 13.83 | 585.01 | 41.80 | 20.38 | 2869.12 |
Mixtral-8x22B-Instruct-v0.1 | 1 | 14.39 | 828.0 | 57.56 | 14.38 | 57.55 |
Mixtral-8x22B-Instruct-v0.1 | 10 | 20.57 | 629.7 | 30.24 | 28.02 | 224.71 |
Mixtral-8x22B-Instruct-v0.1 | 100 | 27.58 | 592.49 | 21.34 | 36.80 | 1609.85 |
AI Gateway Hardware Requirements
For recommendations on AI gateway hardware, see the AI gateway scaling recommendations.
Docs
Edit this page to fix an error or add an improvement in a merge request.
Create an issue to suggest an improvement to this page.
Product
Create an issue if there's something you don't like about this feature.
Propose functionality by submitting a feature request.
Feature availability and product trials
View pricing to see all GitLab tiers and features, or to upgrade.
Try GitLab for free with access to all features for 30 days.
Get help
If you didn't find what you were looking for, search the docs.
If you want help with something specific and could use community support, post on the GitLab forum.
For problems setting up or using this feature (depending on your GitLab subscription).
Request support