Self-hosted models supported platforms

Tier: Ultimate with GitLab Duo Enterprise - Start a trial Offering: GitLab Self-Managed Status: Beta
History

There are multiple platforms available to host your self-hosted Large Language Models (LLMs). Each platform has unique features and benefits that can cater to different needs. The following documentation summarises the currently supported options:

For self-hosted model deployments

  1. vLLM. A high-performance inference server optimized for serving LLMs with memory efficiency. It supports model parallelism and integrates easily with existing workflows.

    For information on available options when using vLLM to run a model, see the vLLM documentation on engine arguments.

    For example, to set up and run the Mistral model, run the following command:

    HF_TOKEN=HUGGING_FACE_TOKEN python -m vllm.entrypoints.openai.api_server \
       --model mistralai/Mistral-7B-Instruct-v0.3 \
       --served-model-name Mistral-7B-Instruct-v0.3 \
       --tensor-parallel-size 8 \
       --tokenizer_mode mistral \
       --load_format mistral \
       --config_format mistral \
       --tokenizer mistralai/Mistral-7B-Instruct-v0.3
    

For cloud-hosted model deployments

  1. AWS Bedrock. A fully managed service that allows developers to build and scale generative AI applications using pre-trained models from leading AI companies. It seamlessly integrates with other AWS services and offers a pay-as-you-go pricing model.

    You must configure the GitLab instance with your appropriate AWS IAM permissions before accessing Bedrock models. You cannot do this in the self-hosted models UI. For example, you can authenticate the AI Gateway instance by defining the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_REGION_NAME when starting the Docker image. For more information, see the AWS Identity and Access Management (IAM) Guide.

  2. Azure OpenAI. Provides access to OpenAI’s powerful models, enabling developers to integrate advanced AI capabilities into their applications with robust security and scalable infrastructure.