GitLab Duo Self-Hosted supported platforms
Offering: GitLab Self-Managed
Status: Beta
-
Introduced in GitLab 17.1 with a flag named
ai_custom_model
. Disabled by default. - Enabled on GitLab Self-Managed in GitLab 17.6.
- Changed to require GitLab Duo add-on in GitLab 17.6 and later.
- Feature flag
ai_custom_model
removed in GitLab 17.8
There are multiple platforms available to host your self-hosted Large Language Models (LLMs). Each platform has unique features and benefits that can cater to different needs. The following documentation summarises the currently supported options:
For self-hosted model deployments
vLLM
vLLM is a high-performance inference server optimized for serving LLMs with memory efficiency. It supports model parallelism and integrates easily with existing workflows.
To install vLLM, see the vLLM Installation Guide. You should install version v0.6.4.post1 or later.
For more information on:
- vLLM supported models, see the vLLM Supported Models documentation.
- Available options when using vLLM to run a model, see the vLLM documentation on engine arguments.
- The hardware needed for the models, see the Supported models and Hardware requirements documentation.
Examples:
Mistral-7B-Instruct-v0.2
-
Download the model from HuggingFace:
git clone https://<your-hugging-face-username>:<your-hugging-face-token>@huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
-
Run the server:
vllm serve <path-to-model>/Mistral-7B-Instruct-v0.3 \ --served_model_name <choose-a-name-for-the-model> \ --tokenizer_mode mistral \ --tensor_parallel_size <number-of-gpus> \ --load_format mistral \ --config_format mistral \ --tokenizer <path-to-model>/Mistral-7B-Instruct-v0.3
Mixtral-8x7B-Instruct-v0.1
-
Download the model from HuggingFace:
git clone https://<your-hugging-face-username>:<your-hugging-face-token>@huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
-
Rename the token config:
cd <path-to-model>/Mixtral-8x7B-Instruct-v0.1 cp tokenizer.model tokenizer.model.v3
-
Run the model:
vllm serve <path-to-model>/Mixtral-8x7B-Instruct-v0.1 \ --tensor_parallel_size 4 \ --served_model_name <choose-a-name-for-the-model> \ --tokenizer_mode mistral \ --load_format safetensors \ --tokenizer <path-to-model>/Mixtral-8x7B-Instruct-v0.1/
For cloud-hosted model deployments
-
AWS Bedrock. A fully managed service that allows developers to build and scale generative AI applications using pre-trained models from leading AI companies. It seamlessly integrates with other AWS services and offers a pay-as-you-go pricing model.
You must configure the GitLab instance with your appropriate AWS IAM permissions before accessing Bedrock models. You cannot do this in the GitLab Duo Self-Hosted UI. For example, you can authenticate the AI Gateway instance by defining the
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
andAWS_REGION_NAME
when starting the Docker image. For more information, see the AWS Identity and Access Management (IAM) Guide. -
Azure OpenAI. Provides access to OpenAI’s powerful models, enabling developers to integrate advanced AI capabilities into their applications with robust security and scalable infrastructure.