Evaluation runner
Evaluation runner (evaluation-runner
) allows GitLab employees to run evaluations on specific GitLab AI features with one click.
- You can run the evaluation on GitLab.com and GitLab-supported self-hosted models.
- To view the AI features that are currently supported, see Evaluation pipelines.
Evaluation runner spins up a new GDK instance on a remote environment, runs an evaluation, and reports the result.
For more details, view the
evaluation-runner
repository.
Architecture
flowchart LR subgraph EV["Evaluators"] PL(["PromptLibrary/ELI5"]) DSIN(["Input Dataset"]) end subgraph ER["EvaluationRunner"] CI["CI/CD pipelines"] subgraph GDKS["Remote GDKs"] subgraph GDKM["GDK-master"] bl1["Duo features on master branch"] fi1["fixtures (Issue,MR,etc)"] end subgraph GDKF["GDK-feature"] bl2["Duo features on feature branch"] fi2["fixtures (Issue,MR,etc)"] end end end subgraph MR["MergeRequests"] GRMR["GitLab-Rails MR"] GRAI["AI Gateway MR"] end MR -- [1] trigger --- CI CI -- [2] spins up --- GDKS PL -- [3] get responses and evaluate --- GDKS