SBoM dependency graph ingestion overview

Overview

The process starts after all SBoM::Occurrence models have been ingested because we ingest them in slices and it would be tricky to process that in slices as well.

All work happens in a background worker which will be added in a subsequent MR so that we do not increase the time it takes to ingest an SBoM report. This means that there will be a delay between when the SBoM report is ingested and before the dependency graph is updated.

All record pertaining to dependency graphs are stored in sbom_graph_paths database table and has foreign keys to sbom_occurrences as well as projects for easier filtering.

Implementation details

This feature is a work in progress so this document can get out of date

Sbom::Ingestion::IngestReportService is responsible for consuming the SBoM report.
After it’s done, we fire off Sbom::BuildDependencyGraphWorker which kicks off the dependency graph calculation to a background worker.
Sbom::BuildDependencyGraph does the actual heavy lifting for us. The class is documented so the details are omitted here.
We will skip calculation of the dependency graph if the SBoM report did not change.
Sbom::PathFinder returns all possible paths to reach target dependency. Do note that this accepts an Sbom::Occurrence because (name, version) pair is not precise enough when working with monorepos.

Details

The database table is designed as a closure table
The database table structure is available.
When a dependency is transitive then the corresponding Sbom::Occurrence#ancestors will contain entries.
When a dependency is a direct dependency then the corresponding Sbom::Occurrence#ancestors will contain an {}.
Dependencies can be both direct and transitive.
There can be more than one version of a given dependency in a project (for example Node allows that).
There can be more than one Sbom::Occurrence for a given dependency version, for example in monorepos. These Sbom::Occurrence rows should have a different input_file_path and source_id (however we will not use source_id when building the dependency tree to avoid SQL JOIN).