Supported GitLab Duo Self-Hosted models and hardware requirements

History

The following table shows the supported models along with their specific features and hardware requirements to help you select the model that best fits your infrastructure needs for optimal performance.

Supported models

The following GitLab-supported large language models (LLMs) are generally available.

  • Fully compatible: The model can likely handle the feature without any loss of quality.
  • Largely compatible: The model supports the feature, but there might be compromises or limitations.
  • Not compatible: The model is unsuitable for the feature, likely resulting in significant quality loss or performance issues.
Model FamilyModelSupported PlatformsCode completionCode generationGitLab Duo Chat
Mistral CodestralCodestral 22B v0.1vLLMcheck-circle-filled Fully compatiblecheck-circle-filled Fully compatibleN/A
MistralMistral 7B-it v0.3vLLMcheck-circle-filled Fully compatiblecheck-circle-filled Fully compatibledash-circle Not compatible
MistralMixtral 8x7B-it v0.1vLLM, AWS Bedrockcheck-circle-filled Fully compatiblecheck-circle-filled Fully compatiblecheck-circle-dashed Limited compatibility
MistralMixtral 8x22B-it v0.1vLLMcheck-circle-filled Fully compatiblecheck-circle-filled Fully compatiblecheck-circle-dashed Limited compatibility
Claude 3Claude 3.5 SonnetAWS Bedrockcheck-circle-filled Fully compatiblecheck-circle-filled Fully compatiblecheck-circle-filled Fully compatible
GPTGPT-4 TurboAzure OpenAIcheck-circle-filled Fully compatiblecheck-circle-filled Fully compatiblecheck-circle-dashed Limited compatibility
GPTGPT-4oAzure OpenAIcheck-circle-filled Fully compatiblecheck-circle-filled Fully compatiblecheck-circle-filled Fully compatible
GPTGPT-4o-miniAzure OpenAIcheck-circle-filled Fully compatiblecheck-circle-filled Fully compatiblecheck-circle-dashed Limited compatibility

Experimental and beta models

The following models are configurable for the functionalities marked below, but are currently in experimental or beta status, under evaluation, and are excluded from the “Customer Integrated Models” definition in the AI Functionality Terms:

Scroll table to see more →
Model familyModelSupported platformsStatusCode completionCode generationGitLab Duo Chat
CodeGemmaCodeGemma 2bvLLMBetacheck-circle Yesdotted-circle Nodotted-circle No
CodeGemmaCodeGemma 7b-itvLLMBetadotted-circle Nocheck-circle Yesdotted-circle No
CodeGemmaCodeGemma 7b-codevLLMBetacheck-circle Yesdotted-circle Nodotted-circle No
Code LlamaCode-Llama 13bvLLMBetadotted-circle Nocheck-circle Yesdotted-circle No
DeepSeek CoderDeepSeek Coder 33b InstructvLLMBetacheck-circle Yescheck-circle Yesdotted-circle No
DeepSeek CoderDeepSeek Coder 33b BasevLLMBetacheck-circle Yesdotted-circle Nodotted-circle No
MistralMistral 7B-it v0.2vLLM
AWS Bedrock
Betacheck-circle Yescheck-circle Yescheck-circle Yes

Hardware requirements

The following hardware specifications are the minimum requirements for running GitLab Duo Self-Hosted on-premise. Requirements vary significantly based on the model size and intended usage:

Base system requirements

  • CPU:
    • Minimum: 8 cores (16 threads)
    • Recommended: 16+ cores for production environments
  • RAM:
    • Minimum: 32 GB
    • Recommended: 64 GB for most models
  • Storage:
    • SSD with sufficient space for model weights and data.

GPU requirements by model size

Model sizeMinimum GPU configurationMinimum VRAM required
7B models
(for example, Mistral 7B)
1x NVIDIA A100 (40GB)35 GB
22B models
(for example, Codestral 22B)
2x NVIDIA A100 (80GB)110 GB
Mixtral 8x7B2x NVIDIA A100 (80GB)220 GB
Mixtral 8x22B8x NVIDIA A100 (80GB)526 GB

Use Hugging Face’s memory utility to verify memory requirements.

Response time by model size and GPU

Small machine

With a a2-highgpu-2g (2x Nvidia A100 40 GB - 150 GB vRAM) or equivalent:

Model nameNumber of requestsAverage time per request (sec)Average tokens in responseAverage tokens per second per requestTotal time for requestsTotal TPS
Mistral-7B-Instruct-v0.317.09717.0101.197.09101.17
Mistral-7B-Instruct-v0.3108.41764.290.3513.70557.80
Mistral-7B-Instruct-v0.310013.97693.2349.1720.813331.59

Medium machine

With a a2-ultragpu-4g (4x Nvidia A100 40 GB - 340 GB vRAM) machine on GCP or equivalent:

Model nameNumber of requestsAverage time per request (sec)Average tokens in responseAverage tokens per second per requestTotal time for requestsTotal TPS
Mistral-7B-Instruct-v0.313.80499.0131.253.80131.23
Mistral-7B-Instruct-v0.3106.00740.6122.858.19904.22
Mistral-7B-Instruct-v0.310011.71695.7159.0615.544477.34
Mixtral-8x7B-Instruct-v0.116.50400.061.556.5061.53
Mixtral-8x7B-Instruct-v0.11016.58768.940.3332.56236.13
Mixtral-8x7B-Instruct-v0.110025.90767.3826.8755.571380.68

Large machine

With a a2-ultragpu-8g (8 x NVIDIA A100 80 GB - 1360 GB vRAM) machine on GCP or equivalent:

Model nameNumber of requestsAverage time per request (sec)Average tokens in responseAverage tokens per second per requestTotal time for requests (sec)Total TPS
Mistral-7B-Instruct-v0.313.23479.0148.413.22148.36
Mistral-7B-Instruct-v0.3104.95678.3135.986.85989.11
Mistral-7B-Instruct-v0.310010.14713.2769.6313.965108.75
Mixtral-8x7B-Instruct-v0.116.08709.0116.696.07116.64
Mixtral-8x7B-Instruct-v0.1109.95645.063.6813.40481.06
Mixtral-8x7B-Instruct-v0.110013.83585.0141.8020.382869.12
Mixtral-8x22B-Instruct-v0.1114.39828.057.5614.3857.55
Mixtral-8x22B-Instruct-v0.11020.57629.730.2428.02224.71
Mixtral-8x22B-Instruct-v0.110027.58592.4921.3436.801609.85

AI Gateway Hardware Requirements

For recommendations on AI gateway hardware, see the AI gateway scaling recommendations.