GitLab Duo data usage

GitLab Duo uses generative AI to help increase your velocity and make you more productive. Each AI-powered feature operates independently and is not required for other features to function.

GitLab uses the right large language models (LLMs) for specific tasks. These LLMs are Anthropic Claude, Fireworks AI-hosted Qwen2.5, and Google Vertex AI Models.

Progressive enhancement

GitLab Duo AI-powered features are designed as a progressive enhancement to existing GitLab features across the DevSecOps platform. These features are designed to fail gracefully and should not prevent the core functionality of the underlying feature. You should note each feature is subject to its expected functionality as defined by the relevant feature support policy.

Stability and performance

GitLab Duo AI-powered features are in a variety of feature support levels. Due to the nature of these features, there may be high demand for usage which may cause degraded performance or unexpected downtime of the feature. We have built these features to gracefully degrade and have controls in place to allow us to mitigate abuse or misuse. GitLab may disable beta and experimental features for any or all customers at any time at our discretion.

Data privacy

GitLab Duo AI-powered features are powered by a generative AI model. The processing of any personal data is in accordance with our Privacy Statement. You may also visit the Sub-Processors page to see the list of Sub-Processors we use to provide these features.

Data retention

The below reflects the current retention periods of GitLab AI model Sub-Processors:

GitLab has arranged zero-day data retention with Anthropic, Fireworks AI, and Google for GitLab Duo requests. Anthropic, Fireworks AI, and Google discard model input and output data immediately after the output is provided; input and output data is not stored for abuse monitoring. Model input and output is not used to train models.

All of these AI providers are under data protection agreements with GitLab that prohibit the use of Customer Content for their own purposes, except to perform their independent legal obligations.

GitLab Duo Chat retains chat history to help you return quickly to previously discussed topics. You can delete chats in the GitLab Duo Chat interface. GitLab does not otherwise retain input and output data unless customers provide consent through a GitLab Support Ticket. Learn more about AI feature logging.

Training data

GitLab does not train generative AI models.

For more information on our AI sub-processors, see:

Google Vertex AI models API data governance, responsible AI, details about foundation model training, Google Secure AI Framework (SAIF), and release notes.
Anthropic Claude’s constitution, training data FAQ, models overview, and data recency article.

Telemetry

GitLab Duo collects aggregated or de-identified first-party usage data through a Snowplow collector. This usage data includes the following metrics:

Number of unique users
Number of unique instances
Prompt and suffix lengths
Model used
Status code responses
API responses times
Code Suggestions also collects:
- Language the suggestion was in (for example, Python)
- Editor being used (for example, VS Code)
- Number of suggestions shown, accepted, rejected, or that had errors
- Duration of time that a suggestion was shown

Model accuracy and quality

Generative AI may produce unexpected results that may be:

Low-quality
Incoherent
Incomplete
Produce failed pipelines
Insecure code
Offensive or insensitive
Out of date information

GitLab is actively iterating on all our AI-assisted capabilities to improve the quality of the generated content. We improve the quality through prompt engineering, evaluating new AI/ML models to power these features, and through novel heuristics built into these features directly.

Secret detection and redaction

History

GitLab Duo includes secret detection and redaction, powered by Gitleaks. It automatically detects and removes sensitive information like API keys, credentials, and tokens from your code before processing it with large language models. This security feature is particularly important for compliance with data protection regulations, like GDPR.

Your code goes through a pre-scan security workflow when using GitLab Duo:

Your code is scanned for sensitive information using Gitleaks.
Any detected secrets are automatically removed from the request.

GitLab Duo Self-Hosted

When you are using GitLab Duo Self-Hosted and the self-hosted AI gateway, you do not share any data with GitLab.

Docs

Edit this page to fix an error or add an improvement in a merge request.

Create an issue to suggest an improvement to this page.

Product

Create an issue if there's something you don't like about this feature.

Propose functionality by submitting a feature request.

Feature availability and product trials

View pricing to see all GitLab tiers and features, or to upgrade.

Try GitLab for free with access to all features for 30 days.

Get help

If you didn't find what you were looking for, search the docs.

If you want help with something specific and could use community support, post on the GitLab forum.

For problems setting up or using this feature (depending on your GitLab subscription).

Request support