Model Migration Process

Current Migration Issues

The table below shows current open issues labeled with AI Model Migration. This provides a live view of ongoing model migration work across GitLab.

display: table
fields: title, author, assignee, milestone, labels, updated
limit: 10
query: label = "AI Model Migration" AND opened = true

Note: This table is dynamically generated using GitLab Query Language (GLQL) when viewing the rendered documentation. It shows up to 10 open issues with the AI Model Migration label, sorted by most recently updated.

Introduction

LLM models are constantly evolving, and GitLab needs to regularly update our AI features to support newer models. This guide provides a structured approach for migrating AI features to new models while maintaining stability and reliability.

Model Migration Timelines

Model migrations typically follow these general timelines:

  • Simple Model Updates (Same Provider): 1-2 weeks

    • Example: Upgrading from Claude Sonnet 3.5 to 3.7
    • Involves model validation, testing, and staged rollout
    • Primary focus on maintaining stability and performance
  • Complex Migrations: 1-2 months (full milestone or longer)

    • Example: Adding support for a new provider like AWS Bedrock
    • Example: Major version upgrades with breaking changes (e.g., Claude 2 to 3)
    • Requires significant API integration work
    • May need infrastructure changes

Timeline Factors

Several factors can impact migration timelines:

  • Current system stability and recent incidents
  • Resource availability and competing priorities
  • Complexity of behavioral changes in new model
  • Scale of testing required
  • Feature flag rollout strategy

Best Practices

  • Always err on the side of caution with initial timeline estimates
  • Use feature flags for gradual rollouts to minimize risk
  • Plan for buffer time to handle unexpected issues
  • Prioritize system stability over speed of deployment

While some migrations can technically be completed quickly, we typically plan for longer timelines to ensure proper testing and staged rollouts. This approach helps maintain system stability and reliability.

Team Responsibilities

Model migrations involve several teams working together. This section clarifies which teams are responsible for different aspects of the migration process.

RACI Matrix for Model Migrations

TaskAI FrameworkFeature TeamsProductInfrastructure
Model configuration file creationR/ACII
Infrastructure compatibilityR/AIIC
Feature-specific prompt adjustmentsCR/AII
Evaluations & testingCR/AII
Feature flag implementationCR/AII
Rollout planningCR/ACI
Documentation updatesCR/ACI
Monitoring & incident responseCR/AIC

R = Responsible, A = Accountable, C = Consulted, I = Informed

Migration Process

Model Mapping Resource: You can see which features use which models and versions via the GitLab AI Features - Default GitLab AI Vendor Models page.

Standard Migration Process

  1. Initialization

    • AI Framework team creates an Issue in the AI Model Version Migration Initiative Epic
    • Issue should use the naming convention: AI Model Migration - Provider/Model/Version
    • Apply the AI Model Migration label
    • AI Framework team adds model configuration to AI Gateway
    • AI Framework team verifies infrastructure compatibility
  2. Feature Team Implementation

    • Feature teams create implementation plans
    • Feature teams adjust prompts if needed
    • Feature teams implement feature flags for controlled rollout
  3. Testing & Validation

    • Feature teams run evaluations against the new model
    • AI Framework team provides evaluation support
  4. Deployment

    • Feature teams manage feature flag rollout
    • Feature teams monitor performance and make adjustments
  5. Completion

    • Feature teams remove feature flags when migration is complete
    • Feature teams update documentation

Model Deprecation Process

  1. Identification & Planning

    • AI Framework team monitors provider announcements
    • AI Framework team creates an epic: Replace discontinued [model] with [replacement]
    • Epic should have the AI Model Migration label
    • Set due date at least 2-4 weeks before provider’s cutoff date
    • AI Framework team identifies replacement models
  2. Evaluation

    • AI Framework team evaluates replacement models
    • Feature teams test affected features with candidates
    • Teams determine the best replacement model
  3. Implementation

    • AI Framework team creates model configuration files
    • Feature teams update features to use the replacement model
    • Teams implement feature flags for controlled rollout
  4. Testing

    • Feature teams run comprehensive evaluations
    • Teams document performance metrics
  5. Deployment

    • Feature teams manage phased rollout via feature flags
    • Teams monitor performance closely
    • Rollout expands gradually based on performance
  6. Completion

    • Remove feature flags when migration is complete
    • Update documentation
    • Clean up deprecated model references

Prerequisites for Model Migration

Before starting a model migration:

  1. Create an issue under the AI Model Version Migration Initiative epic:

    • Label with group::ai framework and AI Model Migration
    • Document behavioral changes or improvements
    • Include any breaking changes or compatibility issues
    • Reference provider documentation
  2. Verify model support in AI Gateway:

    • Check model definitions:
      • For LiteLLM models: ai_gateway/models/v2/container.py
      • For Anthropic models: ai_gateway/models/anthropic.py
      • For new providers: Create new model definition file
    • Verify configurations (enums, stop tokens, timeouts, etc.)
    • Test the model locally:
    • Create an issue for new model support if needed
    • Review provider API documentation for breaking changes
  3. Ensure access to testing environments and monitoring tools

  4. Complete model evaluation using the Prompt Library

Additional Prerequisites for Model Deprecations

For model deprecations:

  1. Create an epic when a deprecation is announced:

    • Label with group::ai framework and AI Model Migration
    • Document the deprecation timeline
    • Include provider migration recommendations
    • Reference the deprecation announcement
    • List all affected features
  2. Evaluate replacement models:

    • Document evaluation criteria
    • Run comparative evaluations
    • Consider regional availability
    • Assess infrastructure changes required
  3. Create migration timeline:

    • Set completion target at least 2-4 weeks before cutoff
    • Include time for each feature update
    • Plan for gradual rollout
    • Allow time for infrastructure changes

Documentation of model changes and deprecations is crucial for tracking impact and future troubleshooting. Always create an issue before beginning any migration process.

Implementation Guidelines

Feature Team Migration Template

Feature teams should use the AI Model Rollout template to implement model migrations. See an example from our Claude 3.7 Sonnet Code Generation Rollout Plan.

Anthropic Model Migration Tasks

AI Framework Team:

  • Add new model to AI gateway configurations
  • Verify compatibility with current API specification
  • Verify the model works with existing API patterns
  • Create model configuration file
  • Document model-specific parameters or behaviors
  • Verify infrastructure compatibility
  • Update model definitions following prompt definition guidelines

Feature Team:

  • Add new model to available models list
  • Change default model in AI-Gateway client behind feature flag
  • Update model references in feature-specific code
  • Implement feature flags for controlled rollout
  • Test prompts with new model
  • Monitor performance during rollout
  • Update documentation

While we’re moving toward AI gateway holding the prompts, feature flag implementation still requires a GitLab release.

Vertex Models Migration Tasks

AI Framework Team:

  • Activate model in Google Cloud Platform
  • Update AI gateway to support new Vertex model
  • Document model-specific parameters

Feature Team:

  • Update model references in feature-specific code
  • Implement feature flags for controlled rollout
  • Test prompts with new model
  • Monitor performance during rollout
  • Update documentation

Feature Flag Implementation

Implementation Steps

For implementing feature flags, refer to our Feature Flags Development Guidelines.

Feature flag implementations will affect self-hosted cloud-connected customers. These customers won’t receive the model upgrade until the feature flag is removed from the AI gateway codebase, as they won’t have access to the new GitLab release.

Model Selection Implementation

Implement model selection logic in:

  • AI gateway client (ee/lib/gitlab/llm/chain/requests/ai_gateway.rb)
  • Model definitions in AI gateway
  • Any custom implementations in specific features

Rollout Strategy

  1. Enable feature flag for small percentage of users/groups
  2. Monitor performance using:
  3. Gradually increase rollout percentage
  4. If issues arise, disable feature flag to rollback
  5. Once stable, remove feature flag

Common Migration Scenarios

Simple Model Version Update (Same Provider)

Example: Upgrading from Claude 3.5 to Claude 3.7

AI Framework Team:

  • Create migration issue
  • Add model configuration file
  • Verify API compatibility
  • Ensure infrastructure support

Feature Teams:

  • Create implementation issues
  • Test prompts with new model
  • Implement feature flags
  • Monitor performance
  • Remove feature flags when stable

New Provider Integration

Example: Adding AWS Bedrock models

AI Framework Team:

  • Create integration plan
  • Implement provider API in AI gateway
  • Create model configuration files
  • Update authentication mechanisms
  • Document provider-specific parameters
  • Evaluate model performance

Feature Teams:

  • Evaluate feature quality and performance with the new model
  • Adapt prompts for new provider’s models
  • Implement feature flags
  • Deploy and monitor
  • Update documentation

Model Deprecation Response

Example: Replacing discontinued Vertex AI Code Gecko v2

AI Framework Team:

  • Create epic to track deprecation
  • Evaluate replacement models
  • Create model configuration
  • Document routing logic
  • Verify infrastructure compatibility

Feature Teams:

  • Implement routing logic
  • Create feature flags for transition
  • Run evaluations
  • Implement staged rollout
  • Monitor performance during transition

Troubleshooting Guide

Prompt Compatibility Issues

If you encounter prompt compatibility issues:

  1. Analyze Errors:

    • Enable “expanded AI logging” to capture model responses
    • Check for “LLM didn’t follow instructions” errors
    • Review model outputs for unexpected patterns
  2. Resolve Issues:

    • Create new prompt version (following semantic versioning)
    • Test prompt variations in evaluation environment
    • Use feature flags to control prompt deployment
    • Monitor performance during rollout

Example: Claude 3.5 to 3.7 Migration

For Claude 3.7 migrations:

  • Create new version 2.0.0 prompt definition
  • Implement feature flag for prompt version control
  • Use AI Framework team’s model configuration file
  • Run evaluations to verify performance
  • Roll out gradually and monitor

AI Framework Team Migration Issue Template

The AI Framework team should create a main migration issue following this template:

# [Model Name] Model Upgrade

## Overview
[Brief description of the new model and its improvements]

## Features to Update
[List of features affected by this migration, organized by category]

### Generally Available Features
- [Feature 1]
- [Feature 2]

### Beta Features
- [Beta Feature 1]

### Experimental Features
- [Experimental Feature 1]

## Required Changes
- Add model configuration file for model flexibility
- New prompt definition created to use the new model
- Feature flag created for controlled rollout

## Technical Details
- [Any technical specifics about this migration]
- [Impact on GitLab.com and self-managed instances]

## Implementation Steps
- [ ] Update model configurations in each feature
- [ ] Verify performance improvements
- [ ] Deploy updates
- [ ] Update documentation

## Timeline
Priority: [Priority level]

## References
- [Model Announcement]
- [Model Documentation]
- [GitLab Documentation]
- [Other relevant links]

## Proposed Solution
[Description of the high-level implementation approach]

## Implementation Details

Please follow the issues below with the associated rollout plans:

| Feature | DRI | ETA | Issue Link |
|---------|-----|-----|------------|
| [Feature 1] | [@username] | [Date] | [Issue link] |
| [Feature 2] | [@username] | [Date] | [Issue link] |

See an example in our Claude 3.7 Model Upgrade issue.

References