GitLab Duo Self-Hosted supported platforms

History

There are multiple platforms available to host your self-hosted Large Language Models (LLMs). Each platform has unique features and benefits that can cater to different needs. The following documentation summarises the currently supported options:

For self-hosted model deployments

vLLM

vLLM is a high-performance inference server optimized for serving LLMs with memory efficiency. It supports model parallelism and integrates easily with existing workflows.

To install vLLM, see the vLLM Installation Guide. You should install version v0.6.4.post1 or later.

Endpoint Configuration

When configuring the endpoint URL for any OpenAI API compatible platforms (such as vLLM) in GitLab:

  • The URL must be suffixed with /v1
  • If using the default vLLM configuration, the endpoint URL would be https://<hostname>:8000/v1
  • If your server is configured behind a proxy or load balancer, you might not need to specify the port, in which case the URL would be https://<hostname>/v1

Finding the model name

After the model has been deployed, you can obtain the model name for the model identifier field in GitLab by querying the vLLM server’s /v1/models endpoint:

Copy to clipboard
curl \
  --header "Authorization: Bearer API_KEY" \
  --header "Content-Type: application/json" \
  http://your-vllm-server:8000/v1/models

The model name is the value of the data.id field in the response.

Example response:

Copy to clipboard
{
  "object": "list",
  "data": [
    {
      "id": "Mixtral-8x22B-Instruct-v0.1",
      "object": "model",
      "created": 1739421415,
      "owned_by": "vllm",
      "root": "mistralai/Mixtral-8x22B-Instruct-v0.1",
      // ... other fields ...
    }
  ]
}

In this example, if the model’s id is Mixtral-8x22B-Instruct-v0.1, you would set the model identifier in GitLab as custom_openai/Mixtral-8x22B-Instruct-v0.1.

For more information on:

Examples:

Mistral-7B-Instruct-v0.2

  1. Download the model from HuggingFace:

    Copy to clipboard
    git clone https://<your-hugging-face-username>:<your-hugging-face-token>@huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
  2. Run the server:

    Copy to clipboard
    vllm serve <path-to-model>/Mistral-7B-Instruct-v0.3 \
       --served_model_name <choose-a-name-for-the-model>  \
       --tokenizer_mode mistral \
       --tensor_parallel_size <number-of-gpus> \
       --load_format mistral \
       --config_format mistral \
       --tokenizer <path-to-model>/Mistral-7B-Instruct-v0.3

Mixtral-8x7B-Instruct-v0.1

  1. Download the model from HuggingFace:

    Copy to clipboard
    git clone https://<your-hugging-face-username>:<your-hugging-face-token>@huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
  2. Rename the token config:

    Copy to clipboard
    cd <path-to-model>/Mixtral-8x7B-Instruct-v0.1
    cp tokenizer.model tokenizer.model.v3
  3. Run the model:

    Copy to clipboard
    vllm serve <path-to-model>/Mixtral-8x7B-Instruct-v0.1 \
      --tensor_parallel_size 4 \
      --served_model_name <choose-a-name-for-the-model> \
      --tokenizer_mode mistral \
      --load_format safetensors \
      --tokenizer <path-to-model>/Mixtral-8x7B-Instruct-v0.1

For cloud-hosted model deployments

  1. AWS Bedrock. A fully managed service that allows developers to build and scale generative AI applications using pre-trained models from leading AI companies. It seamlessly integrates with other AWS services and offers a pay-as-you-go pricing model.

    You must configure the GitLab instance with your appropriate AWS IAM permissions before accessing Bedrock models. You cannot do this in the GitLab Duo Self-Hosted UI. For example, you can authenticate the AI Gateway instance by defining the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_REGION_NAME when starting the Docker image. For more information, see the AWS Identity and Access Management (IAM) Guide.

  2. Azure OpenAI. Provides access to OpenAI’s powerful models, enabling developers to integrate advanced AI capabilities into their applications with robust security and scalable infrastructure.