Evaluation runner
Evaluation runner (evaluation-runner
) allows GitLab employees to run evaluations on specific GitLab AI features with one click.
- You can run the evaluation on GitLab.com and GitLab-supported self-hosted models.
- To view the AI features that are currently supported, see Evaluation pipelines.
Evaluation runner spins up a new GDK instance on a remote environment, runs an evaluation, and reports the result.
For more details, view the
evaluation-runner
repository.
Architecture
flowchart LR
subgraph EV["Evaluators"]
PL(["PromptLibrary/ELI5"])
DSIN(["Input Dataset"])
end
subgraph ER["EvaluationRunner"]
CI["CI/CD pipelines"]
subgraph GDKS["Remote GDKs"]
subgraph GDKM["GDK-master"]
bl1["Duo features on master branch"]
fi1["fixtures (Issue,MR,etc)"]
end
subgraph GDKF["GDK-feature"]
bl2["Duo features on feature branch"]
fi2["fixtures (Issue,MR,etc)"]
end
end
end
subgraph MR["MergeRequests"]
GRMR["GitLab-Rails MR"]
GRAI["AI Gateway MR"]
end
MR -- [1] trigger --- CI
CI -- [2] spins up --- GDKS
PL -- [3] get responses and evaluate --- GDKS