Code Suggestions

Tier: Premium, Ultimate
Add-on: GitLab Duo Core, Pro, or Enterprise, GitLab Duo with Amazon Q
Offering: GitLab.com, GitLab Self-Managed, GitLab Dedicated

Introduced support for Google Vertex AI Codey APIs in GitLab 16.1.
Removed support for GitLab native model in GitLab 16.2.
Introduced support for Code Generation in GitLab 16.3.
Generally available in GitLab 16.7.
Changed to require the GitLab Duo Pro add-on on February 15, 2024. Previously, this feature was included with Premium and Ultimate subscriptions.
Changed to require the GitLab Duo Pro or GitLab Duo Enterprise add-on for all supported GitLab versions starting October 17, 2024.
Introduced support for Fireworks AI-hosted Qwen2.5 code completion model in GitLab 17.6, with a flag named fireworks_qwen_code_completion.
Removed support for Qwen2.5 code completion model in GitLab 17.11.
Enabled Fireworks hosted Codestral by default via the feature flag use_fireworks_codestral_code_completion in GitLab 17.11.
Changed to include GitLab Duo Core in GitLab 18.0.
Enabled Fireworks hosted Codestral as the default model in GitLab 18.1.
To opt out of Fireworks for a group, the feature flag code_completion_opt_out_fireworks is available.
Changed the default model for Code Generation to Claude Sonnet 4 in GitLab 18.2.
Removed feature flag code_suggestions_context in GitLab 18.6.

Use GitLab Duo Code Suggestions to write code more efficiently by using generative AI to suggest code while you’re developing.

Prerequisites

To use Code Suggestions, you need:

A GitLab Duo Core, Pro, or Enterprise add-on.
A Premium or Ultimate subscription.
If you have GitLab Duo Pro or Enterprise, an assigned seat.
If you have GitLab Duo Core, IDE features turned on.

GitLab Duo requires GitLab 17.2 or later. For GitLab Duo Core access, and for the best user experience and results, upgrade to GitLab 18.0 or later. Earlier versions might continue to work, however the experience might be degraded.

Use Code Suggestions

Prerequisites:

You must have set up Code Suggestions.

To use Code Suggestions:

Open your Git project in a supported IDE.
Add the project as a remote of your local repository using git remote add.
Add your project directory, including the hidden .git/ folder, to your IDE workspace or project.
Author your code. As you type, suggestions are displayed. Code Suggestions provides code snippets or completes the current line, depending on the cursor position.
Describe the requirements in natural language. Code Suggestions generates functions and code snippets based on the context provided.
When you receive a suggestion, you can do any of the following:
- To accept a suggestion, press Tab.
- To accept a partial suggestion, press either Control+Right arrow or Command+Right arrow.
- To reject a suggestion, press Esc. In Neovim, press Control+E to exit the menu.
- To ignore a suggestion, keep typing as you usually would.

View multiple code suggestions

For a code completion suggestion in VS Code, multiple suggestion options might be available. To view all available suggestions:

Hover over the code completion suggestion.
Scroll through the alternatives. Either:
- Use keyboard shortcuts:
  - On a Mac, press Option+[ to view the previous suggestion, and press Option+] to view the next suggestion.
  - On Linux and Windows, press Alt+[ to view the previous suggestion, and press Alt+] to view the next suggestion.
- On the dialog that’s displayed, select the right or left arrow to see next or previous options.
Press Tab to apply the suggestion you prefer.

Code completion and generation

Code Suggestions uses code completion and code generation:

	Code completion	Code generation
Purpose	Provides suggestions for completing the current line of code.	Generates new code based on a natural language comment.
Trigger	Triggers when typing, usually with a short delay.	Triggers when pressing `Enter` after writing a comment that includes specific keywords.
Scope	Limited to the current line or small block of code.	Can generate entire methods, functions, or even classes based on the context.
Accuracy	More accurate for small tasks and short blocks of code.	Is more accurate for complex tasks and large blocks of code because a bigger large language model (LLM) is used, additional context is sent in the request (for example, the libraries used by the project), and your instructions are passed to the LLM.
How to use	Code completion automatically suggests completions to the line you are typing.	You write a comment and press `Enter`, or you enter an empty function or method.
When to use	Use code completion to quickly complete one or a few lines of code.	Use code generation for more complex tasks, larger codebases, when you want to write new code from scratch based on a natural language description, or when the file you’re editing has fewer than five lines of code.

Code Suggestions always uses both of these features. You cannot use only code generation or only code completion.

View a code completion vs. code generation comparison demo.

Best practices for code generation

To get the best results from code generation:

Be as specific as possible while remaining concise.
State the outcome you want to generate (for example, a function) and provide details on what you want to achieve.
Add additional information, like the framework or library you want to use.
Add a space or new line after each comment. This space tells the code generator that you have completed your instructions.
In GitLab 17.2 and later, when the advanced_context_resolver and code_suggestions_context feature flags are enabled, open related files in other tabs to expand the context that Code Suggestions is aware of.
Removed feature flag code_suggestions_context in GitLab 18.6.

For example, to create a Python web service with some specific requirements, you might write something like:

# Create a web service using Tornado that allows a user to sign in, run a security scan, and review the scan results.
# Each action (sign in, run a scan, and review results) should be its own resource in the web service
...

AI is non-deterministic, so you may not get the same suggestion every time with the same input. To generate quality code, write clear, descriptive, specific tasks.

For use cases and best practices, follow the GitLab Duo examples documentation.

Truncation of file content

Because of LLM limits and performance reasons, the content of the currently opened file is truncated:

For code completion: to 32,000 tokens (roughly 128,000 characters).
For code generation: to 200,000 tokens (roughly 800,000 characters).

Content above the cursor is prioritized over content below the cursor. The content above the cursor is truncated from the left side, and content below the cursor is truncated from the right side. These numbers represent the maximum input context size for Code Suggestions.

Output length

Because of LLM limits and for performance reasons, the output of Code Suggestions is limited:

For code completion: to 64 tokens (roughly 256 characters).
For code generation: to 2048 tokens (roughly 7168 characters).

Accuracy of results

We are continuing to work on the accuracy of overall generated content. However, Code Suggestions might generate suggestions that are:

Irrelevant.
Incomplete.
Results in failed pipelines.
Potentially insecure.
Offensive or insensitive.

When using Code Suggestions, code review best practices still apply.

Available language models

Different language models can be the source for Code Suggestions.

On GitLab.com: GitLab hosts the models and connects to them through the cloud-based AI gateway.
On GitLab Self-Managed, two options exist:
- GitLab can host the models and connects to them through the cloud-based AI gateway.
- Your organization can use GitLab Duo Self-Hosted, which means you host the AI gateway and language models. You can use GitLab AI vendor models, other supported language models, or to bring your own compatible model.

How the prompt is built

To learn about the code that builds the prompt, see these files:

Code generation: ee/lib/api/code_suggestions.rb in the gitlab repository.
Code completion: ai_gateway/code_suggestions/processing/completions.py in the modelops repository.

Prompt caching

Prompt caching is enabled by default to improve Code Suggestions latency. When prompt caching is enabled, code completion prompt data is temporarily stored in memory by the model vendor. Prompt caching significantly improves latency by avoiding the re-processing of cached prompt and input data. The cached data is never logged to any persistent storage.

Disable prompt caching

You can disable prompt caching for top-level groups in the GitLab Duo settings.

On GitLab.com:

On the left sidebar, select Search or go to and find your group. If you’ve turned on the new navigation, this field is on the top bar.
Select Settings > GitLab Duo.
Select Change configuration.
Disable the Prompt caching toggle.
Select Save changes.

On GitLab Self-Managed:

On the left sidebar, at the bottom, select Admin. If you’ve turned on the new navigation, in the upper-right corner, select your avatar and then select Admin.
Select GitLab Duo.
Select Change Configuration.
Under Prompt Cache, clear the Turn on prompt caching checkbox.
Select Save changes.

Response time

Code Suggestions is powered by a generative AI model.

For code completion, suggestions are usually low latency and take less than one second.
For code generation, algorithms or large code blocks might take more than five seconds to generate.

Your personal access token enables a secure API connection to GitLab.com or to your GitLab instance. This API connection securely transmits a context window from your IDE/editor to the GitLab AI gateway, a GitLab hosted service. The gateway calls the large language model APIs, and then the generated suggestion is transmitted back to your IDE/editor.

Streaming

Streaming of Code Generation responses is supported in JetBrains and Visual Studio, leading to perceived faster response times. Other supported IDEs will return the generated code in a single block.

Streaming is not enabled for code completion.

Direct and indirect connections

By default, code completion requests are sent from the IDE directly to the AI gateway to minimize the latency. For this direct connection to work, the IDE must be able to connect to https://cloud.gitlab.com:443. If this is not possible (for example, because of network restrictions), you can disable direct connections for all users. If you do this, code completion requests are sent indirectly through the GitLab Self-Managed instance, which in turn sends the requests to the AI gateway. This might result in your requests having higher latency.

Configure direct or indirect connections

Prerequisites:

You must be an administrator for the GitLab Self-Managed instance.

On the left sidebar, at the bottom, select Admin. If you’ve turned on the new navigation, in the upper-right corner, select your avatar and then select Admin.
Select Settings > General.
Expand GitLab Duo features.
Under Connection method, choose an option:
- To minimize latency for code completion requests, select Direct connections.
- To disable direct connections for all users, select Indirect connections through the GitLab Self-Managed instance.
Select Save changes.

On the left sidebar, at the bottom, select Admin. If you’ve turned on the new navigation, in the upper-right corner, select your avatar and then select Admin.
Select Settings > General.
Expand AI-native features.
Choose an option:
- To enable direct connections and minimize latency for code completion requests, clear the Disable direct connections for code suggestions checkbox.
- To disable direct connections, select the Disable direct connections for code suggestions checkbox.

Feedback

Provide feedback about your Code Suggestions experience in issue 435783.