Vulnerability tracking overview

At GitLab we run Git combined with automated security testing in Continuous Integration and Continuous Delivery (CI/CD) processes. These processes continuously monitor code changes to detect security vulnerabilities as early as possible. Security testing often involves multiple Static Application Security Testing (SAST) tools, each specialized in detecting specific vulnerabilities, such as hardcoded passwords or insecure data flows. A heterogeneous SAST setup, using multiple tools, helps minimize the software’s attack surface. The security findings from these tools undergo Vulnerability Management, a semi-manual process of understanding, categorizing, storing, and acting on them.

Code volatility (the constant change of the project’s source code) and double reporting (the overlap of findings reported by multiple tools) are potential sources of duplication, imposing futile auditing effort on the analyst.

Vulnerability tracking is an automated process that helps deduplicate and track vulnerabilities throughout the lifetime of a software project.

Our Vulnerability tracking method is based on Scope+Offset (internal).

The predecessor to the Scope+Offset method was line-based fingerprinting which is more fragile, resulting in many already detected vulnerabilities to be re-introduced. Avoiding duplication was the motivation for implementing the Scope+Offset method. See the corresponding research issue for more background (internal).

Components

On a very high level, the vulnerability tracking flow is depicted below. For the remainder of this section, we assume that the SAST analyzer and the Tracking Calculator represent the tracking signature producer component and the Rails backend represents the tracking signature consumer component for the purposes Vulnerability tracking. The components are explained in more detail below.

flowchart LR
  R["Repository"]
  S("SAST Analyzer [CI]")
  T("tracking-calculator [CI]")
  B("Rails backend")

  R --code--> S --gl-sast-report.json--> T --augmented gl-sast-report.json--> B
  R --code --> T

Tracking signature producer

The SAST Analyzer runs in a CI context, analyzes the source code and produces a gl-sast-report.json file. The Tracking Calculator computes scopes by means of the source code and matches them with the vulnerabilities listed in the gl-sast-report.json. If there is a match, Tracking Calculator computes signatures (by means of Scope+Offset) and includes each into the original report (augmenting gl-sast-report) by means of the tracking object (depicted below).

      "tracking": {
        "type": "source",
        "items": [
          {
            "file": "test.c",
            "line_start": 12,
            "line_end": 12,
            "signatures": [
              {
                "algorithm": "scope_offset_compressed",
                "value": "test.c|main()[0]:5"
              },
              {
                "algorithm": "scope_offset",
                "value": "test.c|main()[0]:8"
              }
            ]
          }
        ]
      }

Tracking Calculator is directly embedded into the Docker image of the SAST Analyzer (internal) and invoked by means of this script.

Tracking Calculator already performs deduplication that is enabled by default. In the example above we have two different algorithms scope_offset_compressed and scope_offset where scope_offset_compressed is considered an improvement of scope_offset so that scope_offset_compressed is assigned a higher priority.

If scope_offset and scope_offset_compressed agree on the same fingerprint, only the result from scope_offset_compressed would be added as it is considered the algorithm with the higher priority.

The report is then ingested into the consumer component where these signatures are used to generate vulnerability fingerprints by means of the vulnerability UUID.

Tracking signature consumer

In the Rails code we differentiate between security findings (findings that originate from the report) and vulnerability findings (persisted in the DB). Security findings are generated when the reports is parsed; this is also the place where the UUID is generated.

Storing security findings temporarily

The diagram below depicts the flow that is executed on all pipelines for storing security findings temporarily. One of the most interesting Components from the vulnerability tracking perspective is the OverrideUuidsService. The OverrideUuidsService matches security findings against vulnerability findings on the signature level. If there is a match, the UUID of the security finding is overwritten accordingly. The StoreFindingsService stores the re-calibrated findings in the security_findings table. Detailed documentation about how vulnerabilities are created, starting from the security report, is available in vulnerability creation from security reports.

Source Code References:

sequenceDiagram
    Producer->>Sidekiq: gl-sast-report.json
    Sidekiq->>StoreScansWorker: <<start>>
    StoreScansWorker->>StoreScansService: pipeline id
    loop for all artifacts in "grouped" artifacts
     StoreScansService->>StoreGroupedScansService: artifacts

     loop for every artifact in artifacts
        StoreGroupedScansService->>StoreScanService: artifact
        StoreScanService->>OverrideUuidsService: security-report

        StoreScanService->>StoreFindingsService: store findings
     end
    end

The second scenario relates to the merge request security widget.

Source code references:

The VulnerabilityReportsComparer computes the number of newly added or fixed findings. It first compares the security findings between default and non-default branches to compute the number of added and fixed findings. This component filters results by not re-displaying security findings that correspond to vulnerability findings by recalibrating the security finding UUIDs. The logic implemented in the UUIDOverrider is very similar to OverrideUuidsService.

sequenceDiagram
    MergeRequestModel->>CompareSecurityReportsService: compare_sast_reports
    CompareSecurityReportsService->>VulnerabilityReportsComparer: calculate_changes

Scenario 3: Report ingestion

This is the point where either a security finding becomes a vulnerability or the vulnerability that corresponds to a security finding is updated. This scenario becomes relevant when a pipeline triggered on the default branch upon merging a non-default branch into the default branch. In our context, we are most interested in those cases where we have security findings with overridden_uuid set which implies that there was a clash with an already existing vulnerability; overridden_uuid holds the UUID of the security finding that was overridden by the corresponding vulnerability UUID.

The sequence below is executed to update the UUID of a vulnerability (fingerprint). The recomputation takes place in the UpdateVulnerabilityUuids, ultimately invoking a database update by means of UpdateVulnerabilityUuidsVulnerabilityFinding class.

Source Code References:

sequenceDiagram
    IngestReportsService->>IngestReportService: security_scan
    IngestReportService->>IngestReportSliceService: sliced security_scan
    IngestReportSliceService->>UpdateVulnerabilityUuids: findings map

Hierarchy: Why are algorithms prioritized and what is the impact of this prioritization?

The supported algorithms are defined in VulnerabilityFindingSignatureHelpers. Algorithms are assigned priorities (the integer values in the map below). A higher priority indicates that an algorithm is considered as better than a lower priority algorithm. In other words, going from a lower priority to a higher priority algorithms corresponds to coarsening (better deduplication performance) and going from a higher priority algorithm to a lower priority algorithm corresponds to a refinement (weaker deduplication performance).

  ALGORITHM_TYPES = {
    hash: 1,
    location: 2,
    scope_offset: 3,
    scope_offset_compressed: 4,
    rule_value: 5
  }.with_indifferent_access.freeze

Vulnerability tracking overview

Components

Tracking signature producer

Tracking signature consumer

Storing security findings temporarily

Scenario 2: Merge request security widget

Scenario 3: Report ingestion

Hierarchy: Why are algorithms prioritized and what is the impact of this prioritization?