Vulnerability tracking overview
At GitLab we run Git combined with automated security testing in Continuous Integration and Continuous Delivery (CI/CD) processes. These processes continuously monitor code changes to detect security vulnerabilities as early as possible. Security testing often involves multiple Static Application Security Testing (SAST) tools, each specialized in detecting specific vulnerabilities, such as hardcoded passwords or insecure data flows. A heterogeneous SAST setup, using multiple tools, helps minimize the software’s attack surface. The security findings from these tools undergo Vulnerability Management, a semi-manual process of understanding, categorizing, storing, and acting on them.
Code volatility (the constant change of the project’s source code) and double reporting (the overlap of findings reported by multiple tools) are potential sources of duplication, imposing futile auditing effort on the analyst.
Vulnerability tracking is an automated process that helps deduplicate and track vulnerabilities throughout the lifetime of a software project.
Our Vulnerability tracking method is based on Scope+Offset (internal).
The predecessor to the Scope+Offset
method was line-based fingerprinting which is more
fragile, resulting in many already detected vulnerabilities to be re-introduced.
Avoiding duplication was the motivation for implementing the Scope+Offset
method.
See the corresponding research issue for more background (internal).
Components
On a very high level, the vulnerability tracking flow is depicted below. For the remainder of this section, we assume that the SAST analyzer and the Tracking Calculator represent the tracking signature producer component and the Rails backend represents the tracking signature consumer component for the purposes Vulnerability tracking. The components are explained in more detail below.
Tracking signature producer
The SAST Analyzer runs in a CI context, analyzes the source code and produces a gl-sast-report.json
file. The Tracking Calculator computes scopes by means of the source code and matches them with the vulnerabilities listed in the gl-sast-report.json
. If there is a match, Tracking Calculator computes signatures (by means of Scope+Offset) and includes each into the original report (augmenting gl-sast-report
) by means of the tracking
object (depicted below).
"tracking": {
"type": "source",
"items": [
{
"file": "test.c",
"line_start": 12,
"line_end": 12,
"signatures": [
{
"algorithm": "scope_offset_compressed",
"value": "test.c|main()[0]:5"
},
{
"algorithm": "scope_offset",
"value": "test.c|main()[0]:8"
}
]
}
]
}
Tracking Calculator is directly embedded into the Docker image of the SAST Analyzer (internal) and invoked by means of this script.
It is important to note that Tracking Calculator already performs deduplication
that is enabled by default. In the example above we have two different
algorithms scope_offset_compressed
and scope_offset
where
scope_offset_compressed
is considered an improvement of scope_offset
so
that scope_offset_compressed
is assigned a higher priority.
If scope_offset
and scope_offset_compressed
agree on the same fingerprint,
only the result from scope_offset_compressed
would be added as it is
considered the algorithm with the higher priority.
The report is then ingested into the consumer component where these signatures are used to generate vulnerability fingerprints by means of the vulnerability UUID.
Tracking signature consumer
In the Rails code we differentiate between security findings (findings that originate from the report) and vulnerability findings (persisted in the DB). Security findings are generated when the reports is parsed; this is also the place where the UUID is generated.
Storing security findings temporarily
The diagram below depicts the flow that is executed on all pipelines for
storing security findings temporarily. One of the most interesting Components
from the vulnerability tracking perspective is the OverrideUuidsService
.
The OverrideUuidsService
matches security findings against vulnerability findings on the signature level. If
there is a match, the UUID of the security finding is overwritten
accordingly. The StoreFindingsService
stores the re-calibrated findings in
the security_findings
table. Detailed documentation about how
vulnerabilities are created, starting from the security report, is available
here.
Source Code References:
- StoreScansWorker
- StoreScansService
- StoreGroupedScansService
- StoreScanService
- OverrideUuidsService
- StoreFindingsService
Scenario 2: Merge request security widget
The second scenario relates to the merge request security widget.
Source code references:
The VulnerabilityReportsComparer
computes the number of newly added or fixed
findings. It first compares the security findings between default and
non-default branches to compute the number of added and fixed findings. This
component filters results by not re-displaying security findings that
correspond to vulnerability findings by recalibrating the security finding UUIDs.
The logic implemented in the
UUIDOverrider
is very similar to
OverrideUuidsService.
Scenario 3: Report ingestion
This is the point where either a security finding becomes a vulnerability or the
vulnerability that corresponds to a security finding is updated. This scenario
becomes relevant when a pipeline triggered on the default branch upon merging a
non-default branch into the default branch. In our context, we are most
interested in those cases where we have security findings with
overridden_uuid
set which implies that there was a clash with an already
existing vulnerability; overridden_uuid
holds the UUID of the security
finding that was overridden by the corresponding vulnerability UUID.
The sequence below is executed to update the UUID of a vulnerability
(fingerprint). The recomputation takes place in the
UpdateVulnerabilityUuids
, ultimately invoking a database update by means of
UpdateVulnerabilityUuidsVulnerabilityFinding
class.
Source Code References:
- IngestReportsService
- IngestReportService
- IngestReportSliceService
- UpdateVulnerabilityUuids
- FindingMap
Hierarchy: Why are algorithms prioritized and what is the impact of this prioritization?
The supported algorithms are defined in VulnerabilityFindingSignatureHelpers
. Algorithms are assigned priorities (the integer values in the map below). A higher priority indicates that an algorithm is considered as better than a lower priority algorithm. In other words, going from a lower priority to a higher priority algorithms corresponds to coarsening
(better deduplication performance) and going from a higher priority algorithm to a lower priority algorithm corresponds to a refinement
(weaker deduplication performance).
ALGORITHM_TYPES = {
hash: 1,
location: 2,
scope_offset: 3,
scope_offset_compressed: 4,
rule_value: 5
}.with_indifferent_access.freeze