SBoM dependency graph ingestion overview
Overview
The process starts after all SBoM::Occurrence
models have been ingested because we ingest them in slices and it would be tricky to process that in slices as well.
All work happens in a background worker which will be added in a subsequent MR so that we do not increase the time it takes to ingest an SBoM report. This means that there will be a delay between when the SBoM report is ingested and before the dependency graph is updated.
All record pertaining to dependency graphs are stored in sbom_graph_paths
database table and has foreign keys to sbom_occurrences
as well as projects
for easier filtering.
Implementation details
This feature is a work in progress so this document can get out of date
- Sbom::Ingestion::IngestReportService is responsible for consuming the SBoM report.
- After it’s done, we fire off Sbom::BuildDependencyGraphWorker which kicks off the dependency graph calculation to a background worker.
- Sbom::BuildDependencyGraph does the actual heavy lifting for us. The class is documented so the details are omitted here.
- We will skip calculation of the dependency graph if the SBoM report did not change.
- Sbom::PathFinder returns all possible paths to reach target dependency. Do note that this accepts an
Sbom::Occurrence
because(name, version)
pair is not precise enough when working with monorepos.
Details
- The database table is designed as a closure table
- The database table structure is available.
- When a dependency is transitive then the corresponding
Sbom::Occurrence#ancestors
will contain entries. - When a dependency is a direct dependency then the corresponding
Sbom::Occurrence#ancestors
will contain an{}
. - Dependencies can be both direct and transitive.
- There can be more than one version of a given dependency in a project (for example Node allows that).
- There can be more than one
Sbom::Occurrence
for a given dependency version, for example in monorepos. TheseSbom::Occurrence
rows should have a differentinput_file_path
andsource_id
(however we will not usesource_id
when building the dependency tree to avoid SQL JOIN).