This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
ongoing @DylanGriffith @mikolaj_wawrzyniak devops create 2024-05-17

GitLab Duo Workflow

Execution Environment

Executive Summary

The functionality to support Duo Workflow needs to be able to execute arbitrary code which effectively means “untrusted” code. This means that they cannot just run like any other service we deploy and specifically they cannot just run inside the AI Gateway.

In order to address this the Duo Workflow functionality will be comprised of 2 separate components:

  1. The Duo Workflow Service which is a Python based service we run in our infrastructure. This is built on top of LangGraph.
  2. The Duo Worklow Executor which is a Go binary that communicates via long running gRPC connection to Duo Workflow Service and executes the arbtitrary commands. It will be possible for users to run this locally or in CI pipelines

In our first release we will support 2 execution modes:

  1. Local Executor: which will run commands and edit files locally in a sandboxed Docker container on the developer machine. They will be able to see the files being edited live and it will be interactive
  2. CI Executor: For all non-local usecases of Duo Workflow (e.g. issue/epic based workflows) these will be triggered by the GitLab UI and will create a CI Pipeline to run the Duo Workflow Executor

Our architecture will also support mixed deployments for self-managed such that some features of Duo Workflow will be available using a cloud hosted AI Gateway.

Detailed plan

We plan on building this feature set with 3 independent components that can be run in multiple runtimes:

  1. The Duo Workflow Web UI. This will be web UI built into GitLab that manages the creation and interaction of all workflows. There may be many interaction points in the GitLab application but there should be a central workflow UI with reusable components (e.g. Vue components) that could be embedded into our editor extensions
  2. The Duo Workflow Service. This will be a Python based service we deploy with a gRPC API. The only interface to this will be the gRPC interface which is called from the Duo Workflow Executor. Internally this will use LangGraph to execute the workflows. It will not have any persisted state but the state of running workflows will be kept in memory and periodically checkpointed in GitLab.
  3. The Duo Workflow Executor. This will be written in Go for easy installation in development containers. This component will run in CI jobs or on a user’s local workstation. In the local workstation it will run sandboxed in a Docker container with the working directory optionally mounted by the user for a live pairing experience. It will only be responsible for opening a gRPC connection to Duo Workflow Service and executing the commands it is told to.

The following are important constraints of the architecture:

  1. All state management for workflows will be inside GitLab.
  2. Duo Workflow Service is expected to periodically checkpoint it’s state in GitLab
  3. Duo Workflow Service in-memory state can be dropped/lost at any time so checkpointing will be the only guaranteed point that can be returned to
  4. If a local Duo Workflow Executor drops connection then the Duo Workflow Service will checkpoint and shutdown the state as soon as it runs into something where it is waiting on the executor
  5. In order to avoid multiple Duo Workflow Service instances running on the same workflow the Duo Workflow Service will always acquire a lock with GitLab before it starts running. When it suspends it will release the lock and similarly there will be a timeout state if it has not checkpointed in the last 60 seconds. GitLab will not accept checkpoints from a timed out run of the Duo Workflow Service.
  6. Each time a Duo Workflow Service resumes a workflow it gets a new ID and this is sent when checkpointing so that GitLab can drop/ignore zombie services running the workflow and inform the zombie service to shutdown.
  7. Code is checkpointed by the executor pushing hidden Git refs to the GitLab instance. This will be happening on the same frequency as other checkpoints.
  8. For local execution Duo Workflows are initiated using the Duo Workflow Executor directly calling Duo Workflow Service
  9. For workflows triggered via the UI that don’t require a Duo Workflow Executor GitLab can call the Duo Workflow Service directly
  10. All API calls from Duo Workflow Service to GitLab that access private data or update data will be authenticated on behalf of the user that created the worklow. Duo Workflow Service should not need privileged access to GitLab

CI pipelines have been chosen as the hosted runtime option for Duo Workflow Executor because it is the only infrastructure we have available today to run untrusted customer workloads with stability, support, security, abuse prevention and a billing model. In the short term for early customers we may rely on the existing compute minutes for CI pipelines but in the long run we may want to deploy dedicated runners and introduce a billing model specific for Duo Workflow.

For many development use cases we expect developers may prefer to run Duo Workflow Executor locally as it can operate on a locally mounted directory and allow the user to more easily watch changes as they happen.

High level architecture

Backend architecture

(this PNG can be edited in Excalidraw)

  1. Initially we focus on running locally and in CI pipelines with all inputs as environment variables
  2. State stored in GitLab so it can be accessed from the web UI and through IDE extensions

Self-managed architecture

With local AI Gateway

When customers are running the AI Gateway locally the architecture will be very similar to GitLab.com . This will also allow them to use whatever customer models they configure in their AI Gateway.

With cloud AI Gateway

In order to allow self-managed customers to trial and rapidly adopt Duo Workflow without running all AI Gateway components this architecture will supported a mixed deployment mode. In this case we assume that the cloud AI Gateway will not have access to the customers GitLab instance but we can make use of the local executor (on the user’s machine or in a CI runner) to proxy all interactions with GitLab.

Data flow

The below diagram shows what happens when the user is triggering workflows from their IDE using a local executor. The architecture will be similar when triggering from the GitLab UI using CI pipelines except that GitLab will start a CI pipeline to create run the Duo Workflow Executor and create the workflow.

sequenceDiagram participant user as User participant ide as IDE participant executor as Duo Workflow Executor participant gitlab_rails as GitLab Rails box AI-gateway service participant duo_workflow_service as Duo Workflow Service participant ai_gateway as AI Gateway end participant llm_provider as LLM Provider user->>ide: trigger workflow from IDE ide->>executor: start executor executor->>+duo_workflow_service: Solve this issue duo_workflow_service->>gitlab_rails: Create the workflow duo_workflow_service->>llm_provider: Ask LLM what to do llm_provider->>duo_workflow_service: Need the file list duo_workflow_service->>executor: execute `ls` duo_workflow_service->>gitlab_rails: Save checkpoint executor->>duo_workflow_service: result `ls` duo_workflow_service->>llm_provider: What's next? llm_provider->>duo_workflow_service: Here's a patch duo_workflow_service->>executor: execute `git apply` duo_workflow_service->>gitlab_rails: Save checkpoint duo_workflow_service->>executor: execute `poetry run pytest` duo_workflow_service->>gitlab_rails: Save checkpoint executor->>duo_workflow_service: result `poetry run pytest` duo_workflow_service->>llm_provider: fix the tests llm_provider->>duo_workflow_service: Here's a patch duo_workflow_service->>executor: execute `git apply` duo_workflow_service->>gitlab_rails: Save checkpoint duo_workflow_service->>executor: execute `poetry run pytest` executor->>duo_workflow_service: result `poetry run pytest` duo_workflow_service->>executor: Next step? executor->>gitlab_rails: Check in & Next step? gitlab_rails->>executor: Last step! executor->>duo_workflow_service: Done! deactivate duo_workflow_service gitlab_rails->>user: Workflow done!

CI Pipeline architecture

We don’t want users to have to configure a specific .gitlab-ci.yml in order to support Duo Workflow. In order to avoid this we’ll use the same approach as that used by DAST site validations which dynamically constructs a pipeline configuration in GitLab and triggers the pipeline without using any .gitlab-ci.yml.

CI Pipelines also must be run inside a project. There will be some usecases of Duo Workflow where there is no appropriate project in which to run the pipeline (e.g. bootstrapping a new project). For these workflows we will:

  1. Initially require the user to have a default Workflow project created. It can just be any empty project and we’ll automatically run the pipeline there.
  2. If this proves to be too much setup we’ll automate the creation of a default Duo Workflow project for you
  3. If the UX is poor over time we might abstract the user away from the existence of the Project altogether and make this an implementation detail. This will be considered a last resort because it could be quite a wide impacting change to GitLab as projects are a central part of GitLab.

Considerations for CI Runners and Infrastructure

  1. Our Duo Workflow rollout may involve substantial increases to our CI runner usage
  2. Duo Workflow will likely involve running long running CI pipelines that use very little CPU. Mostly what they will be doing is communicating back and forth with the LLMs and users in a long running gRPC connection.
  3. Users will expect very low latency for CI Runner startup
    1. We should determine if there are ways to have preloaded VMs with our Docker images running ready to start a pipeline when it a workflow is triggered
  4. We likely want a set of CI Runners that are just for Duo Workflow. This may mean enabling the runners to a subset of customers or just using appropriate job labeling/runner matching to only use these runners for Duo Workflow
  5. It might be possible to roll out some Duo Workflow features on our existing runner fleets but we believe there will be enough benefits to invest in segregating these runners.

State checkpointing

The Duo Workflow state will be persisted in GitLab-Rails as the Duo Workflow Service works. There are 2 components to state:

  1. The State object being managed by Langgraph. This includes all prompt history between user and agents and any other metadata created by the LangGraph graph
  2. The working directory where the agent is writing code.
  3. We will have data retention limits on all state. We will use PostgreSQL partitioning to drop old workflow data after some time and we will also drop old Git refs after some time.

We will be persisting the LangGraph state object using APIs in GitLab to persist this state to PostgreSQL as it goes. The API will use similar LangGraph conventions to identify all checkpoints with a thread_ts as implemented in the POC https://gitlab.com/gitlab-org/gitlab/-/merge_requests/153551.

For the current working directory which contains the code the agent has written so far we will store this by pushing hidden Git refs to GitLab for the checkpoint. Each checkpoint will have an associated ref and a checkpoint naming convention (or something stored in PostgreSQL) will allow us to identify the appropriate Git ref for the state checkpoint.

Storing in Git has the advantage that we don’t need to build any new API for storing artifacts and it’s very easy for the user to access the code by just checking out that SHA. It also has huge storage savings where a workflow is working on an existing large project. Ultimately we expect code changes end up being pushed to Git anyway so this is the simplest solution.

Some Duo Workflows do not have an existing project (e.g. bootstrapping a project). Even those workflows will need to be triggered from some project (as explained in the section about CI piplelines). As such we can use the workflow project as a temporary repository to store the snapshots of code generated by the workflow.

Consideration should also be made to cleanup Git refs over time after some workflow expiration period.

Options we’ve considered and pros/cons

Delegate only unsafe execution to local/CI pipelines

This was the option we chose. It attempts to keep as much of the functionality as possible in services we run while delegating the unsafe execution to Duo Workflow Executor which can run locally or in CI pipelines.

Pros:

  1. Running the infrastructure ourselves gives us more control over the versions being rolled out
  2. There is less dependencies the user needs to install for local usage
  3. It offers a rapid onboarding experience for self-managed customers to try Duo Workflow without deploying any new GitLab components

Cons

  1. We need to deploy and maintain new infrastructure which has different scaling characteristics to other services we run duo to long running execution

Run it locally

Pros:

  1. This keeps developers in their local environment where most of them work
  2. Compute is absorbed by the local developer so they don’t have to worry about being billed per minute
  3. Low latency for user interaction especially where the user needs to review/edit code while the agent is working

Cons:

  1. There are more risks running it locally unless you have an isolated development environment as commands have full access to your computer. This can be mitigated by UX that limits what commands the agent can run without user confirmation.
  2. This approach will require some local developer setup and may not be suited to tasks that users are expecting to kick off from the web UI (e.g. issue/epic planning)

CI pipelines (on CI runners)

See https://gitlab.com/gitlab-org/gitlab/-/issues/457959 for a POC and investigation.

Pros:

  1. CI pipelines are the only pre-configured infrastructure we have that can run untrusted workflows
  2. We have an established billing model for CI minutes

Cons:

  1. CI pipelines are slow to start up and this might mean that iteration and incremental AI development might be slow if the pipelines need to be restarted while timing out waiting for user input
  2. CI minutes will need to be consumed while the agent is awaiting for user input. This will likely require a timeout mechanism and as such if the user returns we’ll need to restart a new pipeline when they give input
  3. CI pipelines run in a difficult to access environment (ie. you cannot SSH it or introspect it live) and as such it may make it difficult for users to interact with code that is being built out live in front of them without
  4. CI pipelines require there to be some project to run in. This is not likely something we can overcome but we may be able to simplify the setup process by automatically creating you a “workflow project” for your workflow pipelines to run in
  5. When we implement non-code workflows (e.g. reviewing MRs) there is no need for an isolated compute environment but we’ll still be forcing customers to use compute minutes. We’ve seen this is not a good experience in other cases like X-Ray reports

GitLab workspaces (remote development)

See https://gitlab.com/gitlab-org/gitlab/-/issues/458339 for a POC and investigation.

Pros:

  1. This has the fastest iteration cycle as the agent is working locally in your development environment and can interact with you and you can even see and edit the same files live as them
  2. Customers can run it on their own infrastructure and this gives them control over efficient resource usage

Cons:

  1. Today we only support customers bringing their own infrastructure (K8s cluster) and this means that the barrier to getting started is to bring your own K8s cluster and this is a fairly significant effort
  2. If we wanted to build out infrastructure on GitLab.com to save customers having to bring their own K8s cluster this would be a fairly large effort from a security and infrastructure perspective. It’s possible but to deal with all the complexities of security, abuse and billing would require many teams involvement in both initial development and sustained maintenance.

Security

Threat modeling

See https://gitlab.com/gitlab-com/gl-security/product-security/appsec/threat-models/-/issues/46.

Security considerations for local execution

Local execution presents the highest value opportunity for developers but also comes with the greatest risk that a bug or mistake from an LLM could lead to causing significant harm to a user’s local development environment or compromise confidential information.

Some examples of risks:

  1. An AI that can make honest but significant mistakes
  2. An AI that might sometimes be adversarial
  3. The AI gateway serving the LLM responses may be compromised which would then allow shell access to all users of this tool

Sandboxing Duo Workflow Executor

One proposal here to mitigate risks would be to use some form of sandboxing where the Duo Workflow Executor is only able to run inside of an unprivileged Docker container. Such a solution would need to:

  1. Mount the local working directory into the container so it is still editing the files the user is working on in the host
  2. Install all development dependencies the user or agent would need to run the application and tests

The above option may also make use of Dev Containers.

User confirmation for commands

Another option for limiting the risk is to require the user to confirm every command the agent executes before it runs the command. We will likely be implementing this as an option anyway but given the desire for efficient development of larger workflows it might limit the efficiency of the tool if it needs to execute a lot of commands to finish a task.

We may also consider a hybrid approach where there a set of user-defined allowlisted commands (e.g. ls and cat) which allow the agent to read and learn about a project without the user needing to confirm. This approach may not solve all needs though where the user may want to allowlist commands like rspec which then effectively still allow for arbitrary code execution as the agent can put whatever they want in the spec file.

Duo Workflow UI

The Duo Workflow UI will need to be available at least in the following places:

  1. In GitLab Rails web UI
  2. In our editor extensions

The fact that we’ll need multiple UIs and also as described above we have multiple execution environments for Duo Workflow Executor have led to the following decisions.

How do we package and run the local web UI

We will build the majority of data access related to our local IDE UI into the GitLab Language Server to maximize re-use across all our editor extensions. We will also employ a mix of webviews rendered in the IDE and served by the LSP as well as native IDE UI elements. Where it doesn’t considerably limit our user experience we’ll opt to build the interface into a web page served from the LSP and then rendered in the IDE as a web view because this again maximises re-use across all our editor extensions.

How does the web UI reflect the current state live

The Duo Workflow Service will persist it’s state frequently to the main GitLab Rails application. There will be GraphQL subscriptions for streaming updates about a workflow. The UI will consume these GraphQL apis and update the UI as updates stream in.

Given that the user may be running the Duo Workflow Executor locally which may be seeing some of the state as it happens it might be reasonable to want to just live render the in-memory state of the running workflow process. We may choose this optional deliberately for latency reasons but we need to be careful to architect the frontend and Duo Workflow Executor as completely decoupled because they will not always be running together. For example users may trigger a workflow locally which runs in GitLab CI or they may be using the web UI to interact with and re-run a workflow that was initiated locally.

As such we will generally prefer not to have direct interaction between the UI and Executor but instead all communication should be happening via GitLab. Any exceptions to this might be considered case by case but we’ll need clear API boundaries which allow the functionality to easily be changed to consume from GitLab for the reasons described.

Duo Workflow Agent’s tools

Duo Workflow agents are, in a simplified view, a pair of: prompt and LLM. By this definition, agents on their own are not able to interact with the outside world, which significantly limits the scope of work that can be automated. To overcome this limitation, agents are being equipped with tools.

Tools are functions that agents can invoke using the function calling LLM feature. These functions perform different actions on behalf of the agent. For example, an agent might be equipped with a tool (function) that executes bash commands like ls or cat and returns the result of those bash commands back to the agent.

The breadth of the tool set available to agents defines the scope of work that can be automated. Therefore, to set up the Duo Workflow feature for success, it will be required to deliver a broad and exhaustive tool set.

Foreseen tools include:

  1. Tools to execute bash commands via the Duo Workflow Executor
  2. Tools to manipulate files (including reading and writing to files)
  3. Tools to manipulate Git VCS
  4. Tools to integrate with the GitLab HTTP API

The fact that the Duo Workflow Service is going to require Git and GitLab API tools entails that the Duo Workflow Service must have the ability to establish an SSH connection and make HTTP requests to the GitLab instance. This ability can be granted directly to the Duo Workflow Service or can be provided via the Duo Workflow Executor if a direct connection between the Duo Workflow Service and a GitLab instance is not possible due to a firewall or network partition.

Milestones

  1. All the components implemented and communicating correctly with only a trivial workflow implemented
  2. Checkpointing code as well as LangGraph state
  3. Workflow locking in GitLab to ensure only 1 concurrent instance of a workflow
  4. Add more workflows and tools
  5. Ability to resume a workflow

POC - Demos

  1. POC: Solve issue (internal only)
  2. POC: Duo Workflow in Workspaces (internal only)
  3. POC: Autograph using Docker Executor (internal only)
  4. POC: Duo Workflows in CI pipelines with timeout and restart (internal only)