Self-hosted models supported platforms

Tier: Ultimate with GitLab Duo Enterprise - Start a trial Offering: Self-managed Status: Beta
History
  • Introduced in GitLab 17.1 with a flag named ai_custom_model. Disabled by default.
  • Enabled on self-managed in GitLab 17.6.
  • Changed to require GitLab Duo add-on in GitLab 17.6 and later.
  • Feature flag ai_custom_model removed in GitLab 17.8

There are multiple platforms available to host your self-hosted Large Language Models (LLMs). Each platform has unique features and benefits that can cater to different needs. The following documentation summarises the currently supported options:

For self-hosted model deployments

  1. vLLM. A high-performance inference server optimized for serving LLMs with memory efficiency. It supports model parallelism and integrates easily with existing workflows.

    For information on available options when using vLLM to run a model, see the vLLM documentation on engine arguments.

    For example, to set up and run the Mistral model, run the following command:

    HF_TOKEN=HUGGING_FACE_TOKEN python -m vllm.entrypoints.openai.api_server \
       --model mistralai/Mistral-7B-Instruct-v0.3 \
       --served-model-name Mistral-7B-Instruct-v0.3 \
       --tensor-parallel-size 8 \
       --tokenizer_mode mistral \
       --load_format mistral \
       --config_format mistral \
       --tokenizer mistralai/Mistral-7B-Instruct-v0.3
    

For cloud-hosted model deployments

  1. AWS Bedrock. A fully managed service that allows developers to build and scale generative AI applications using pre-trained models from leading AI companies. It seamlessly integrates with other AWS services and offers a pay-as-you-go pricing model.
  2. Azure OpenAI. Provides access to OpenAI’s powerful models, enabling developers to integrate advanced AI capabilities into their applications with robust security and scalable infrastructure.