SBoM dependency graph ingestion overview
Overview
The process starts after all SBoM::Occurrence models have been ingested because we ingest them in slices and it would be tricky to process that in slices as well.
All work happens in a background worker which will be added in a subsequent MR so that we do not increase the time it takes to ingest an SBoM report. This means that there will be a delay between when the SBoM report is ingested and before the dependency graph is updated.
All record pertaining to dependency graphs are stored in sbom_graph_paths database table and has foreign keys to sbom_occurrences as well as projects for easier filtering.
Implementation details
This feature is a work in progress so this document can get out of date
- Sbom::Ingestion::IngestReportService is responsible for consuming the SBoM report.
- After it’s done, we fire off Sbom::BuildDependencyGraphWorker which kicks off the dependency graph calculation to a background worker.
- Sbom::BuildDependencyGraph does the actual heavy lifting for us. The class is documented so the details are omitted here.
- We will skip calculation of the dependency graph if the SBoM report did not change.
- Sbom::PathFinder returns all possible paths to reach target dependency. Do note that this accepts an
Sbom::Occurrencebecause(name, version)pair is not precise enough when working with monorepos.
Details
- The database table is designed as a closure table
- The database table structure is available.
- When a dependency is transitive then the corresponding
Sbom::Occurrence#ancestorswill contain entries. - When a dependency is a direct dependency then the corresponding
Sbom::Occurrence#ancestorswill contain an{}. - Dependencies can be both direct and transitive.
- There can be more than one version of a given dependency in a project (for example Node allows that).
- There can be more than one
Sbom::Occurrencefor a given dependency version, for example in monorepos. TheseSbom::Occurrencerows should have a differentinput_file_pathandsource_id(however we will not usesource_idwhen building the dependency tree to avoid SQL JOIN).