- On Kubernetes
- Example setup with LiteLLM and Ollama
- Example setup for Codestral with Ollama
- Related topics
Set up a self-hosted large language model with LiteLLM
LiteLLM is an OpenAI proxy server. You can use LiteLLM to simplify the integration with different large language models (LLMs) by leveraging the OpenAI API spec. Use LiteLLM to easily switch between different LLMs.
On Kubernetes
On Kubernetes environments, Ollama can be installed with a Helm chart or following the example in the official documentation.
Example setup with LiteLLM and Ollama
-
Pull and serve the model with Ollama:
ollama pull codegemma:2b ollama serve
-
Create the LiteLLM proxy configuration that routes a request from the AI Gateway directed to a specific model version instead of the generic named
codegemma
model. In this example we are usingcodegemma:2b
, which is being served athttp://localhost:11434
by Ollama:# config.yaml model_list: - model_name: codegemma litellm_params: model: ollama/codegemma:2b api_base: http://localhost:11434
-
Run the proxy:
litellm --config config.yaml
-
Send a test request:
curl --request 'POST' \ 'http://localhost:5052/v2/code/completions' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "current_file": { "file_name": "app.py", "language_identifier": "python", "content_above_cursor": "<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>", "content_below_cursor": "" }, "model_provider": "litellm", "model_endpoint": "http://127.0.0.1:4000", "model_name": "codegemma", "telemetry": [], "prompt_version": 2, "prompt": "" }' | jq
{ "id": "id", "model": { "engine": "litellm", "name": "text-completion-openai/codegemma", "lang": "python" }, "experiments": [], "object": "text_completion", "created": 1718631985, "choices": [ { "text": "print(\"Hello, World!\")", "index": 0, "finish_reason": "length" } ] }
Example setup for Codestral with Ollama
When serving the Codestral model through Ollama, there is an additional step required to make Codestral work with both code completions and code generations.
-
Pull the Codestral model:
ollama pull codestral
-
Edit the default template used for Codestral:
ollama run codestral
>>> /set template {{ .Prompt }} Set prompt template. >>> /save codestral Created new model 'codestral'