GitLab Duo data usage

GitLab Duo uses generative AI to help increase your velocity and make you more productive. Each AI-powered feature operates independently and is not required for other features to function.

GitLab uses best-in-class large language models (LLMs) for specific tasks. These LLMs are Google Vertex AI Models and Anthropic Claude.

Progressive enhancement

GitLab Duo AI-powered features are designed as a progressive enhancement to existing GitLab features across the DevSecOps platform. These features are designed to fail gracefully and should not prevent the core functionality of the underlying feature. You should note each feature is subject to its expected functionality as defined by the relevant feature support policy.

Stability and performance

GitLab Duo AI-powered features are in a variety of feature support levels. Due to the nature of these features, there may be high demand for usage which may cause degraded performance or unexpected downtime of the feature. We have built these features to gracefully degrade and have controls in place to allow us to mitigate abuse or misuse. GitLab may disable beta and experimental features for any or all customers at any time at our discretion.

Data privacy

GitLab Duo AI-powered features are powered by a generative AI model. The processing of any personal data is in accordance with our Privacy Statement. You may also visit the Sub-Processors page to see the list of Sub-Processors we use to provide these features.

Data retention

The below reflects the current retention periods of GitLab AI model Sub-Processors:

  • Anthropic discards model input and output data immediately after the output is provided. Anthropic currently does not store data for abuse monitoring. Model input and output is not used to train models.
  • Google discards model input and output data immediately after the output is provided. Google currently does not store data for abuse monitoring. Model input and output is not used to train models.

All of these AI providers are under data protection agreements with GitLab that prohibit the use of Customer Content for their own purposes, except to perform their independent legal obligations.

GitLab retains input and output for up to 30 days for the purpose of troubleshooting, debugging, and addressing latency issues.

Training data

GitLab does not train generative AI models based on private (non-public) data. The vendors we work with also do not train models based on private data.

For more information on our AI sub-processors, see:

Telemetry

GitLab Duo collects aggregated or de-identified first-party usage data through a Snowplow collector. This usage data includes the following metrics:

  • Number of unique users
  • Number of unique instances
  • Prompt and suffix lengths
  • Model used
  • Status code responses
  • API responses times
  • Code Suggestions also collects:
    • Language the suggestion was in (for example, Python)
    • Editor being used (for example, VS Code)
    • Number of suggestions shown, accepted, rejected, or that had errors
    • Duration of time that a suggestion was shown

Model accuracy and quality

Generative AI may produce unexpected results that may be:

  • Low-quality
  • Incoherent
  • Incomplete
  • Produce failed pipelines
  • Insecure code
  • Offensive or insensitive
  • Out of date information

GitLab is actively iterating on all our AI-assisted capabilities to improve the quality of the generated content. We improve the quality through prompt engineering, evaluating new AI/ML models to power these features, and through novel heuristics built into these features directly.