Merge request diffs development guide

This document explains the backend design and flow of merge request diffs. It should help contributors:

  • Understand the code design.
  • Identify areas for improvement through contribution.

It’s intentional that it doesn’t contain too many implementation details, as they can change often. The code better explains these details. The components mentioned here are the major parts of the application for how merge request diffs are generated, stored, and returned to users.

note
This page is a living document. Update it accordingly when the parts of the codebase touched in this document are changed or removed, or when new components are added.

Data model

Four main ActiveRecord models represent what we collectively refer to as diffs. These database-backed records replicate data contained in the project’s Git repository, and are in part a cache against excessive access requests to Gitaly. Additionally, they provide a logical place for:

  • Calculated and retrieved metadata about the pieces of the diff.
  • General class- and instance- based logic.
Data model of diffsData model of the four ActiveRecord models used in diffsMergeRequestMergeRequestDiffMergeRequestDiffCommitMergeRequestDiffDetailMergeRequestDiffFileMergeRequestDiffCommitUser

MergeRequestDiff

MergeRequestDiff is defined in app/models/merge_request_diff.rb. This class holds metadata and context related to the diff resulting from a set of commits. It defines methods that are the primary means for interacting with diff contents, individual commits, and the files containing changes.

#<MergeRequestDiff:0x00007fd1ed63b4d0
 id: 28,
 state: "collected",
 merge_request_id: 28,
 created_at: Tue, 06 Sep 2022 18:56:02.509469000 UTC +00:00,
 updated_at: Tue, 06 Sep 2022 18:56:02.754201000 UTC +00:00,
 base_commit_sha: "ae73cb07c9eeaf35924a10f713b364d32b2dd34f",
 real_size: "9",
 head_commit_sha: "bb5206fee213d983da88c47f9cf4cc6caf9c66dc",
 start_commit_sha: "0b4bc9a49b562e85de7cc9e834518ea6828729b9",
 commits_count: 6,
 external_diff: "diff-28",
 external_diff_store: 1,
 stored_externally: nil,
 files_count: 9,
 patch_id_sha: "d504412d5b6e6739647e752aff8e468dde093f2f",
 sorted: true,
 diff_type: "regular",
 verification_checksum: nil>

Diff content is usually accessed through this class. Logic is often applied to diff, file, and commit content before it is returned to a user.

MergeRequestDiff#commits_count

When MergeRequestDiff is saved, associated MergeRequestDiffCommit records are counted and cached into the commits_count column. This number displays on the merge request page as the counter for the Commits tab.

If MergeRequestDiffCommit records are deleted, the counter doesn’t update.

MergeRequestDiffCommit

MergeRequestDiffCommit is defined in app/models/merge_request_diff_commit.rb. This class corresponds to a single commit contained in its corresponding MergeRequestDiff, and holds header information about the commit.

#<MergeRequestDiffCommit:0x00007fd1dfc6c4c0
  authored_date: Wed, 06 Aug 2022 06:35:52.000000000 UTC +00:00,
  committed_date: Wed, 06 Aug 2022 06:35:52.000000000 UTC +00:00,
  merge_request_diff_id: 28,
  relative_order: 0,
  sha: "bb5206fee213d983da88c47f9cf4cc6caf9c66dc",
  message: "Feature conflcit added\n\nSigned-off-by: Sample User <sample.user@example.com>\n",
  trailers: {},
  commit_author_id: 19,
  committer_id: 19>

Every MergeRequestDiffCommit has a corresponding MergeRequest::DiffCommitUser record it :belongs_to, in ActiveRecord parlance. These records are :commit_author and :committer, and could be distinct individuals.

MergeRequest::DiffCommitUser

MergeRequest::DiffCommitUser is defined in app/models/merge_request/diff_commit_user.rb. It captures the name and email of a given commit, but contains no connection itself to any User records.

#<MergeRequest::DiffCommitUser:0x00007fd1dff7c930
  id: 19,
  name: "Sample User",
  email: "sample.user@example.com">

MergeRequestDiffFile

MergeRequestDiffFile is defined in app/models/merge_request_diff_file.rb. This record of this class represents the diff of a single file contained in the MergeRequestDiff. It holds both meta and specific information about the file’s relationship to the change, such as:

  • Whether it is added or renamed.
  • Its ordering in the diff.
  • The raw diff output itself.

External diff storage

By default, diff data of a MergeRequestDiffFile is stored in diff column in the merge_request_diff_files table. On some installations, the table can grow too large, so they’re configured to store diffs on external storage to save space. To configure it, see Merge request diffs storage.

When configured to use external storage:

  • The diff column in the database is left NULL.
  • The associated MergeRequestDiff record sets the stored_externally attribute to true on creation of MergeRequestDiff.

A cron job named ScheduleMigrateExternalDiffsWorker is also scheduled at minute 15 of every hour. This migrates diff that are still stored in the database to external storage.

MergeRequestDiffDetail

MergeRequestDiffDetail is defined in app/models/merge_request_diff_detail.rb. This class provides verification information for Geo replication, but otherwise is not used for user-facing diffs.

#<MergeRequestDiffFile:0x00007fd1ef7c9048
  merge_request_diff_id: 28,
  relative_order: 0,
  new_file: true,
  renamed_file: false,
  deleted_file: false,
  too_large: false,
  a_mode: "0",
  b_mode: "100644",
  new_path: "files/ruby/feature.rb",
  old_path: "files/ruby/feature.rb",
  diff:
   "@@ -0,0 +1,4 @@\n+# This file was changed in feature branch\n+# We put different code here to make merge conflict\n+class Conflict\n+end\n",
  binary: false,
  external_diff_offset: nil,
  external_diff_size: nil>

Flow

These flowcharts should help explain the flow from the controllers down to the models for different features. This page is not intended to document the entirety of options for access and working with diffs, focusing solely on the most common.

Generation of MergeRequestDiff* records

As explained above, we use database tables to cache information from Gitaly when displaying diffs on merge requests. When enabled, we also use object storage when storing diffs.

We have 2 types of merge request diffs: base diff and HEAD diff. Each type is generated differently.

Base diff

On every push to a merge request branch, we create a new merge request diff version.

This flowchart shows a basic explanation of how each component is used in this case.

Flowchart of generating a new diff versionHigh-level flowchart of components used when creating a new diff version, based on a Git push to a branch
PostReceive worker
MergeRequests::RefreshService
Reload diff of merge requests
Create merge request diff
Database
Ensure commit SHAs
Gitaly
Set patch-id
Save commits
Save diffs
Object Storage
Keep around commits
Clear highlight and stats cache
Redis

This sequence diagram shows a more detailed explanation of this flow.

Data flow of building a new diffDetailed model of the data flow through the components that build a new diff versionRedisGitlab_Diff_StatsCacheGitlab_Diff_HighlightCacheMergeRequestDiffFileObjectStorageMergeRequestDiffCommitCommitGitalyRepositoryMergeRequestDiffMergeRequests_ReloadDiffsServiceMergeRequestMergeRequests_RefreshServicePostReceiveRedisGitlab_Diff_StatsCacheGitlab_Diff_HighlightCacheMergeRequestDiffFileObjectStorageMergeRequestDiffCommitCommitGitalyRepositoryMergeRequestDiffMergeRequests_ReloadDiffsServiceMergeRequestMergeRequests_RefreshServicePostReceiveReload diff of merge requestsCreate merge request diffEnsure commit SHAsSet patch-idSave commitsSave diffsopt[When external diffs is enabled]Keep around commitsClear highlight and stats cacheexecute()reload_diff()execute()create_merge_request_diff()create()source_branch_sha()commit()FindCommit RPCGitlab::Git::Commitnew()CommitCommitCommit SHAget_patch_id()GetPatchID RPCPatch IDPatch IDListCommits RPCCommitscreate_bulk()ListCommits RPCCommitsupload diffslegacy_bulk_insert()keep_around()WriteRef RPCclear()clear()cachecache

HEAD diff

Whenever mergeability of a merge request is checked and the merge request merge_status is either :unchecked, :cannot_be_merged_recheck, :checking, or :cannot_be_merged_rechecking, we attempt to merge the changes from source branch to target branch and write to a ref. If it’s successful (meaning, no conflict), we generate a diff based on the generated commit and show it as the HEAD diff.

The flow differs from the base diff generation as it has a different entry point.

This flowchart shows a basic explanation of how each component is used when generating a HEAD diff.

Generating a HEAD diff (high-level view)High-level flowchart of components used when generating a HEAD diff
MergeRequestMergeabilityCheckWorker
MergeRequests::MergeabilityCheckService
Merge changes to ref
Gitaly
Recreate merge request HEAD diff
Database
Ensure commit SHAs
Set patch-id
Save commits
Save diffs
Object Storage
Keep around commits

This sequence diagram shows a more detailed explanation of this flow.

Generating a HEAD diff (detail view)Detailed sequence diagram of generating a new HEAD diffMergeRequestDiffFileObjectStorageMergeRequestDiffCommitMergeRequestDiffMergeRequestMergeRequests_ReloadMergeHeadDiffServiceCommitGitalyRepositoryMergeRequests_MergeToRefServiceMergeRequests_MergeabilityCheckServiceMergeRequestMergeabilityCheckWorkerMergeRequestDiffFileObjectStorageMergeRequestDiffCommitMergeRequestDiffMergeRequestMergeRequests_ReloadMergeHeadDiffServiceCommitGitalyRepositoryMergeRequests_MergeToRefServiceMergeRequests_MergeabilityCheckServiceMergeRequestMergeabilityCheckWorkerMerge changes to refRecreate merge request HEAD diffEnsure commit SHAsSet patch-idSave commitsSave diffsopt[When external diffs is enabled]Keep around commitsexecute()execute()merge_to_ref()UserMergeBranch RPCCommit SHAcommit()FindCommit RPCGitlab::Git::Commitnew()CommitCommitexecute()create_merge_request_diff()create()merge_ref_head()commit()FindCommit RPCGitlab::Git::Commitnew()CommitCommitCommit SHAget_patch_id()GetPatchID RPCPatch IDPatch IDListCommits RPCCommitscreate_bulk()ListCommits RPCCommitsupload diffslegacy_bulk_insert()keep_around()WriteRef RPC

diffs_batch.json

The most common avenue for viewing diffs is the Changes tab at the top of merge request pages in the GitLab UI. When selected, the diffs themselves are loaded via a paginated request to /-/merge_requests/:id/diffs_batch.json, which is served by Projects::MergeRequests::DiffsController#diffs_batch.

This flowchart shows a basic explanation of how each component is used in a diffs_batch.json request.

Viewing a diffHigh-level flowchart a diffs_batch request, which renders diffs for browser display
Yes
No
Frontend
diffs_batch.json
Preload diffs and ivars
Gitaly
Database
Getting diff file collection
Calculate unfoldable diff lines
ETag header is not stale
Return 304
Serialize diffs
Redis
Return 200 with JSON

Different cases exist when viewing diffs, though, and the flow for each case differs.

Viewing HEAD, latest or specific diff version

The HEAD diff is viewed by default, if it is available. If not, it falls back to latest diff version. It’s also possible to view a specific diff version. These cases have the same flow.

Viewing the most recent diffSequence diagram showing how a particular diff is chosen for display, first with the HEAD diff, then the latest diff, followed by a specific version if it's requestedMergeRequestDiffFilePaginatedDiffSerializerRedisGitlab_Diff_StatsCacheGitlab_Diff_HighlightCacheGitlab_Diff_PositionCollectionGitlab_Diff_FileCollection_MergeRequestDiffBatchMergeRequestDiffMergeRequest.#define_diff_vars.#diffs_batchFrontendMergeRequestDiffFilePaginatedDiffSerializerRedisGitlab_Diff_StatsCacheGitlab_Diff_HighlightCacheGitlab_Diff_PositionCollectionGitlab_Diff_FileCollection_MergeRequestDiffBatchMergeRequestDiffMergeRequest.#define_diff_vars.#diffs_batchFrontendPreload diffs and ivarsGetting diff file collectionCalculate unfoldable diff linesbreak[when ETag header ispresent and is not stale]Serialize diffs and render JSONAPI callbefore_actionmerge_request_head_diff() or merge_request_diff()find()MergeRequestDiffMergeRequestDiff@comparediffs_in_batch()new()diff file collectiondiff file collectionnote_positions_for_pathsnew() then unfoldable()position collectionunfoldable_positionsreturn 304 HTTPwrite_cache()write_if_empty()write_if_empty()cachecacherepresent()diff_files()raw_diffs()Get all associated recordsGitlab::Git::DiffCollectiondiff filesfind_by_path()Read data from cachedecorate()Read data from cachediff filesJSONreturn 200 HTTP with JSON

However, if Show whitespace changes is not selected when viewing diffs:

  • Whitespace changes are ignored.
  • The flow changes, and now involves Gitaly.
Viewing diffs without whitespace changesSequence diagram showing how a particular diff is chosen for display, if whitespace changes are not requested - first with the HEAD diff, then the latest diff, followed by a specific version if it's requestedGitalyRepositoryPaginatedDiffSerializerRedisGitlab_Diff_StatsCacheGitlab_Diff_HighlightCacheGitlab_Diff_FileCollection_MergeRequestDiffBatchGitlab_Diff_PositionCollectionGitlab_Diff_FileCollection_CompareMergeRequestDiffMergeRequest.#define_diff_vars.#diffs_batchFrontendGitalyRepositoryPaginatedDiffSerializerRedisGitlab_Diff_StatsCacheGitlab_Diff_HighlightCacheGitlab_Diff_FileCollection_MergeRequestDiffBatchGitlab_Diff_PositionCollectionGitlab_Diff_FileCollection_CompareMergeRequestDiffMergeRequest.#define_diff_vars.#diffs_batchFrontendPreload diffs and ivarsGetting diff file collectionCalculate unfoldable diff linesbreak[when ETag header ispresent and is not stale]opt[Cache higlights and stats when viewing HEAD, latest or specific version]Serialize diffs and render JSONAPI callbefore_actionmerge_request_head_diff() or merge_request_diff()find()MergeRequestDiffMergeRequestDiff@comparediffs_in_batch()new()diff file collectiondiff file collectionnote_positions_for_pathsnew() then unfoldable()position collectionunfoldable_positionsreturn 304 HTTPwrite_cache()write_if_empty()write_if_empty()cachecacherepresent()diff_files()raw_diffs()diff()CommitDiff RPCGitalyClient::DiffStitcherGitlab::Git::DiffCollectiondiff filesfind_by_path()Read data from cachedecorate()Read data from cachediff filesJSONreturn 200 HTTP with JSON

Compare between merge request diff versions

You can also compare different diff versions when viewing diffs. The flow is different from the default flow, as it makes requests to Gitaly to generate a comparison between two diff versions. It also doesn’t use Redis for highlight and stats caches.

Comparing diffsSequence diagram of how diffs are compared against each otherGitalyRepositoryPaginatedDiffSerializerGitlab_Diff_PositionCollectionMergeRequestGitlab_Diff_FileCollection_CompareCompareMergeRequestDiff.#define_diff_vars.#diffs_batchFrontendGitalyRepositoryPaginatedDiffSerializerGitlab_Diff_PositionCollectionMergeRequestGitlab_Diff_FileCollection_CompareCompareMergeRequestDiff.#define_diff_vars.#diffs_batchFrontendPreload diffs and ivarsGetting diff file collectionCalculate unfoldable diff linesbreak[when ETag header ispresent and is not stale]Serialize diffs and render JSONAPI callbefore_actioncompare_with(start_sha)new()CompareCompare@comparediffs_in_batch()new()diff file collectiondiff file collectionnote_positions_for_pathsnew() then unfoldable()position collectionunfoldable_positionsreturn 304 HTTPrepresent()diff_files()raw_diffs()diff()CommitDiff RPCGitalyClient::DiffStitcherGitlab::Git::DiffCollectiondiff filesdiff filesJSONreturn 200 HTTP with JSON

Viewing commit diff

Another feature to view merge request diffs is to view diffs of a specific commit. It differs from the default flow, and requires Gitaly to get the diff of the specific commit. It also doesn’t use Redis for the highlight and stats caches.

Viewing commit diffSequence diagram showing how viewing the diff of a specific commit is different from the default diff view flowPaginatedDiffSerializerGitlab_Diff_PositionCollectionMergeRequestGitlab_Diff_FileCollection_CommitCommitGitalyRepository.#define_diff_vars.#diffs_batchFrontendPaginatedDiffSerializerGitlab_Diff_PositionCollectionMergeRequestGitlab_Diff_FileCollection_CommitCommitGitalyRepository.#define_diff_vars.#diffs_batchFrontendPreload diffs and ivarsGetting diff file collectionCalculate unfoldable diff linesbreak[when ETag header ispresent and is not stale]Serialize diffs and render JSONAPI callbefore_actioncommit()FindCommit RPCGitlab::Git::Commitnew()CommitCommit@comparediffs_in_batch()new()diff file collectiondiff file collectionnote_positions_for_pathsnew() then unfoldable()position collectionunfoldable_positionsreturn 304 HTTPrepresent()diff_files()raw_diffs()CommitDiff RPCGitalyClient::DiffStitcherGitlab::Git::DiffCollectiondiff filesJSONreturn 200 HTTP with JSON

diffs.json

It’s also possible to view diffs while creating a merge request by scrolling down to the bottom of the new merge request page and clicking Changes tab. It doesn’t use the diffs_batch.json endpoint as the merge request record isn’t created at that point yet. It uses the diffs.json instead.

This flowchart shows a basic explanation of how each component is used in a diffs.json request.

Diff request flow (high level)High-level flowchart of the components used in a diffs request
Frontend
diffs.json
Build merge request
Get diffs
Render view with diffs
Gitaly
Respond with JSON with the rendered view

This sequence diagram shows a more detailed explanation of this flow.

Diff request flow (low level)Sequence diagram with a deeper view of the components used in a diffs requestRepositoryHAMLGitlab_Diff_FileCollection_CompareMergeRequestGitalyCompareMergeRequests_BuildService.#diffsFrontendRepositoryHAMLGitlab_Diff_FileCollection_CompareMergeRequestGitalyCompareMergeRequests_BuildService.#diffsFrontendBuild merge requestGet diffsRender view with diffsAPI callexecutenew()Comparecommits()ListCommits RPCCommitsCommitsMergeRequestdiffs()diffs()new()diff file collectiondiff file collection@diffs =view_to_html_string('projects/merge_requests/creations/_diffs', diffs: @diffs)diff_files()raw_diffs()diff()CommitDiff RPCGitalyClient::DiffStitcherGitlab::Git::DiffCollectiondiff filesdiff filesrendered viewRespond with JSON with rendered view