Model Migration Process
Introduction
LLM models are constantly evolving, and GitLab needs to regularly update our AI features to support newer models. This guide provides a structured approach for migrating AI features to new models while maintaining stability and reliability.
Purpose
Provide a comprehensive guide for migrating AI models within GitLab.
Expected Duration
Model migrations typically follow these general timelines:
Simple Model Updates (Same Provider): 2-3 weeks
- Example: Upgrading from Claude Sonnet 3.5 to 3.6
- Involves model validation, testing, and staged rollout
- Primary focus on maintaining stability and performance
- Can sometimes be expedited when urgent, but 2 weeks is standard
Complex Migrations: 1-2 months (full milestone or longer)
- Example: Adding support for a new provider like AWS Bedrock
- Example: Major version upgrades with breaking changes (e.g., Claude 2 to 3)
- Requires significant API integration work
- May need infrastructure changes
- Extensive testing and validation required
Timeline Factors
Several factors can impact migration timelines:
- Current system stability and recent incidents
- Resource availability and competing priorities
- Complexity of behavioral changes in new model
- Scale of testing required
- Feature flag rollout strategy
Best Practices
- Always err on the side of caution with initial timeline estimates
- Use feature flags for gradual rollouts to minimize risk
- Plan for buffer time to handle unexpected issues
- Communicate conservative timelines externally while working to deliver faster
- Prioritize system stability over speed of deployment
While some migrations can technically be completed quickly, we typically plan for longer timelines to ensure proper testing and staged rollouts. This approach helps maintain system stability and reliability.
Scope
Applicable to all AI model-related teams at GitLab. We currently support using Anthropic and Google Vertex models. Support for AWS Bedrock models is proposed in issue 498119.
Prerequisites
Before starting a model migration:
Create an issue under the AI Model Version Migration Initiative epic with the following:
- Label with
group::ai framework
- Document any known behavioral changes or improvements in the new model
- Include any breaking changes or compatibility issues
- Reference any model provider documentation about the changes
- Label with
Verify the new model is supported in our current AI-Gateway API specification by:
- Check model definitions in AI gateway:
- For LiteLLM models:
ai_gateway/models/v2/container.py
- For Anthropic models:
ai_gateway/models/anthropic.py
- For new providers: Create a new model definition file in
ai_gateway/models/
- For LiteLLM models:
- Verify model configurations:
- Model enum definitions
- Stop tokens
- Timeout settings
- Completion type (text or chat)
- Max token limits
- Testing the model locally in AI gateway:
- Set up the AI gateway development environment
- Configure the necessary API keys in your
.env
file - Test the model using the Swagger UI at
http://localhost:5052/docs
- If the model isn’t supported, create an issue in the AI gateway repository to add support
- Review the provider’s API documentation for any breaking changes:
- Check model definitions in AI gateway:
Ensure you have access to testing environments and monitoring tools
Complete model evaluation using the Prompt Library
Documentation of model changes is crucial for tracking the impact of migrations and helping with future troubleshooting. Always create an issue to track these changes before beginning the migration process.
Migration Tasks
Migration Tasks for Anthropic Model
- Optional - Investigate if the new model is supported within our current AI-Gateway API specification. This step can usually be skipped. However, sometimes to support a newer model, we may need to accommodate a new API format.
- Add the new model to our available models list.
- Change the default model in our AI-Gateway client. Please place the change around a feature flag. We may need to quickly rollback the change.
- Update the model definitions in AI gateway following the prompt definition guidelines
While we’re moving toward AI gateway holding the prompts, feature flag implementation still requires a GitLab release.
Migration Tasks for Vertex Models
Work in Progress
Feature Flag Process
Implementation Steps
For implementing feature flags, refer to our Feature Flags Development Guidelines.
Feature flag implementations will affect self-hosted cloud-connected customers. These customers won’t receive the model upgrade until the feature flag is removed from the AI gateway codebase, as they won’t have access to the new GitLab release.
Model Selection Implementation
The model selection logic should be implemented in:
- AI gateway client (
ee/lib/gitlab/llm/chain/requests/ai_gateway.rb
) - Model definitions in AI gateway
- Any custom implementations in specific features that override the default model
Rollout Strategy
- Enable the feature flag for a small percentage of users/groups initially
- Monitor performance metrics and error rates using:
- Sidekiq Service dashboard for error ratios and response latency
- AI gateway metrics dashboard for gateway-specific metrics
- AI gateway logs for detailed error investigation
- Feature usage dashboard for adoption metrics
- Periscope dashboard for token usage and feature statistics
- Gradually increase the rollout percentage
- If issues arise, quickly disable the feature flag to rollback to the previous model
- Once stability is confirmed, remove the feature flag and make the migration permanent
For more details on monitoring during migrations, see the Monitoring and Metrics section below.
Scope of Work
AI Features to Migrate
- Duo Chat Tools:
ci_editor_assistant/prompts/anthropic.rb
- CI Editorgitlab_documentation/executor.rb
- GitLab Documentationepic_reader/prompts/anthropic.rb
- Epic Readerissue_reader/prompts/anthropic.rb
- Issue Readermerge_request_reader/prompts/anthropic.rb
- Merge Request Reader
- Chat Slash Commands:
refactor_code/prompts/anthropic.rb
- Refactorwrite_tests/prompts/anthropic.rb
- Write Testsexplain_code/prompts/anthropic.rb
- Explain Codeexplain_vulnerability/executor.rb
- Explain Vulnerability
- Experimental Tools:
- Summarize Comments Chat
- Fill MR Description
Docs
Edit this page to fix an error or add an improvement in a merge request.
Create an issue to suggest an improvement to this page.
Product
Create an issue if there's something you don't like about this feature.
Propose functionality by submitting a feature request.
Feature availability and product trials
View pricing to see all GitLab tiers and features, or to upgrade.
Try GitLab for free with access to all features for 30 days.
Get help
If you didn't find what you were looking for, search the docs.
If you want help with something specific and could use community support, post on the GitLab forum.
For problems setting up or using this feature (depending on your GitLab subscription).
Request support