Code Suggestions data usage

Tier: Premium or Ultimate with GitLab Duo Pro Offering: GitLab.com, Self-managed, GitLab Dedicated

Code Suggestions is powered by a generative AI model.

Your personal access token enables a secure API connection to GitLab.com or to your GitLab instance. This API connection securely transmits a context window from your IDE/editor to the GitLab AI Gateway, a GitLab hosted service. The gateway calls the large language model APIs, and then the generated suggestion is transmitted back to your IDE/editor.

GitLab selects the best-in-class large-language models for specific tasks. We use Google Vertex AI Code Models and Anthropic Claude for Code Suggestions.

View data retention policies.

Telemetry

For self-managed instances that have enabled Code Suggestions and for SaaS accounts, we collect aggregated or de-identified first-party usage data through our Snowplow collector. This usage data includes the following metrics:

  • Language the code suggestion was in (for example, Python)
  • Editor being used (for example, VS Code)
  • Number of suggestions shown, accepted, rejected, or that had errors
  • Duration of time that a suggestion was shown
  • Prompt and suffix lengths
  • Model used
  • Number of unique users
  • Number of unique instances

Inference window context

Code Suggestions inferences against the currently opened file, the content before and after the cursor, the filename, and the extension type. For more information on possible future context expansion to improve the quality of suggestions, see epic 11669.

Truncation of file content

Because of LLM limits and performance reasons, the content of the currently opened file is truncated:

  • For code completion: to 2048 tokens (roughly 8192 characters).
  • For code generation: to 50,000 characters.

Content above the cursor is prioritized over content below the cursor. The content above the cursor is truncated from the left side, and content below the cursor is truncated from the right side.

Training data

GitLab does not train generative AI models based on private (non-public) data. The vendors we work with also do not train models based on private data.

For more information on GitLab Code Suggestions data sub-processors, see: