Evaluation runner

Evaluation runner (evaluation-runner) allows GitLab employees to run evaluations on specific GitLab AI features with one click.

You can run the evaluation on GitLab.com and GitLab-supported self-hosted models.
To view the AI features that are currently supported, see Evaluation pipelines.

Evaluation runner spins up a new GDK instance on a remote environment, runs an evaluation, and reports the result.

For more details, view the evaluation-runner repository.

Architecture

flowchart LR
  subgraph EV["Evaluators"]
    PL(["PromptLibrary/ELI5"])
    DSIN(["Input Dataset"])
  end

  subgraph ER["EvaluationRunner"]
    CI["CI/CD pipelines"]
    subgraph GDKS["Remote GDKs"]
        subgraph GDKM["GDK-master"]
          bl1["GitLab Duo features on master branch"]
          fi1["fixtures (Issue,MR,etc)"]
        end
        subgraph GDKF["GDK-feature"]
          bl2["GitLab Duo features on feature branch"]
          fi2["fixtures (Issue,MR,etc)"]
        end
    end
  end

  subgraph MR["MergeRequests"]
    GRMR["GitLab-Rails MR"]
    GRAI["AI Gateway MR"]
  end

  MR -- [1] trigger --- CI
  CI -- [2] spins up --- GDKS
  PL -- [3] get responses and evaluate --- GDKS