Supported GitLab Duo Self-Hosted models and hardware requirements

Tier: Ultimate with GitLab Duo Enterprise - Start a trial
Offering: GitLab Self-Managed
Status: Beta
History

The following table shows the supported models along with their specific features and hardware requirements to help you select the model that best fits your infrastructure needs for optimal performance.

Approved LLMs

Install one of the following GitLab-approved large language models (LLMs):

Model family Model Supported platforms Status Code completion Code generation GitLab Duo Chat
Mistral Codestral Codestral 22B v0.1 vLLM Generally available 🟢 Green 🟢 Green N/A
Mistral Mistral 7B-it v0.3 vLLM Generally available 🟢 Green 🟢 Green 🔴 Red
Mistral Mixtral 8x7B-it v0.1 vLLM
AWS Bedrock
Generally available 🟢 Green 🟢 Green 🟡 Amber
Mistral Mixtral 8x22B-it v0.1 vLLM Generally available 🟢 Green 🟢 Green 🟢 Green
Claude 3 Claude 3.5 Sonnet AWS Bedrock Generally available 🟢 Green 🟢 Green 🟢 Green
GPT GPT-4 Turbo Azure OpenAI Generally available 🟢 Green 🟢 Green 🟡 Amber
GPT GPT-4o Azure OpenAI Generally available 🟢 Green 🟢 Green 🟢 Green
GPT GPT-4o-mini Azure OpenAI Generally available 🟢 Green 🟢 Green 🟡 Amber

Legend:

  • 🟢 Green - Strongly recommended. The model can handle the feature without any loss of quality.
  • 🟡 Amber - Recommended. The model supports the feature, but there might be minor compromises or limitations.
  • 🔴 Red - Not recommended. The model is unsuitable for the feature, likely resulting in significant quality loss or performance issues.

The following models are under evaluation, and support is limited:

Model family Model Supported platforms Status Code completion Code generation GitLab Duo Chat
CodeGemma CodeGemma 2b vLLM Beta Yes No No
CodeGemma CodeGemma 7b-it vLLM Beta No Yes No
CodeGemma CodeGemma 7b-code vLLM Beta Yes No No
Code Llama Code-Llama 13b vLLM Beta No Yes No
DeepSeek Coder DeepSeek Coder 33b Instruct vLLM Beta Yes Yes No
DeepSeek Coder DeepSeek Coder 33b Base vLLM Beta Yes No No
Mistral Mistral 7B-it v0.2 vLLM
AWS Bedrock
Beta Yes Yes Yes

Hardware requirements

The following hardware specifications are the minimum requirements for running GitLab Duo Self-Hosted on-premise. Requirements vary significantly based on the model size and intended usage:

Base system requirements

  • CPU:
    • Minimum: 8 cores (16 threads)
    • Recommended: 16+ cores for production environments
  • RAM:
    • Minimum: 32 GB
    • Recommended: 64 GB for most models
  • Storage:
    • SSD with sufficient space for model weights and data.

GPU requirements by model size

Model size Minimum GPU configuration Minimum VRAM required
7B models
(for example, Mistral 7B)
1x NVIDIA A100 (40GB) 35 GB
22B models
(for example, Codestral 22B)
2x NVIDIA A100 (80GB) 110 GB
Mixtral 8x7B 2x NVIDIA A100 (80GB) 220 GB
Mixtral 8x22B 8x NVIDIA A100 (80GB) 526 GB

Use Hugging Face’s memory utility to verify memory requirements.

Response time by model size and GPU

Small machine

With a a2-highgpu-2g (2x Nvidia A100 40 GB - 150 GB vRAM) or equivalent:

Model name Number of requests Average time per request (sec) Average tokens in response Average tokens per second per request Total time for requests Total TPS
Mistral-7B-Instruct-v0.3 1 7.09 717.0 101.19 7.09 101.17
Mistral-7B-Instruct-v0.3 10 8.41 764.2 90.35 13.70 557.80
Mistral-7B-Instruct-v0.3 100 13.97 693.23 49.17 20.81 3331.59

Medium machine

With a a2-ultragpu-4g (4x Nvidia A100 40 GB - 340 GB vRAM) machine on GCP or equivalent:

Model name Number of requests Average time per request (sec) Average tokens in response Average tokens per second per request Total time for requests Total TPS
Mistral-7B-Instruct-v0.3 1 3.80 499.0 131.25 3.80 131.23
Mistral-7B-Instruct-v0.3 10 6.00 740.6 122.85 8.19 904.22
Mistral-7B-Instruct-v0.3 100 11.71 695.71 59.06 15.54 4477.34
Mixtral-8x7B-Instruct-v0.1 1 6.50 400.0 61.55 6.50 61.53
Mixtral-8x7B-Instruct-v0.1 10 16.58 768.9 40.33 32.56 236.13
Mixtral-8x7B-Instruct-v0.1 100 25.90 767.38 26.87 55.57 1380.68

Large machine

With a a2-ultragpu-8g (8 x NVIDIA A100 80 GB - 1360 GB vRAM) machine on GCP or equivalent:

Model name Number of requests Average time per request (sec) Average tokens in response Average tokens per second per request Total time for requests (sec) Total TPS
Mistral-7B-Instruct-v0.3 1 3.23 479.0 148.41 3.22 148.36
Mistral-7B-Instruct-v0.3 10 4.95 678.3 135.98 6.85 989.11
Mistral-7B-Instruct-v0.3 100 10.14 713.27 69.63 13.96 5108.75
Mixtral-8x7B-Instruct-v0.1 1 6.08 709.0 116.69 6.07 116.64
Mixtral-8x7B-Instruct-v0.1 10 9.95 645.0 63.68 13.40 481.06
Mixtral-8x7B-Instruct-v0.1 100 13.83 585.01 41.80 20.38 2869.12
Mixtral-8x22B-Instruct-v0.1 1 14.39 828.0 57.56 14.38 57.55
Mixtral-8x22B-Instruct-v0.1 10 20.57 629.7 30.24 28.02 224.71
Mixtral-8x22B-Instruct-v0.1 100 27.58 592.49 21.34 36.80 1609.85

AI Gateway Hardware Requirements

For recommendations on AI gateway hardware, see the AI gateway scaling recommendations.