Reduce repository size

Tier: Free, Premium, Ultimate Offering: GitLab.com, Self-managed, GitLab Dedicated

Git repositories become larger over time. When large files are added to a Git repository:

  • Fetching the repository becomes slower because everyone must download the files.
  • They take up a large amount of storage space on the server.
  • Git repository storage limits can be reached.

Rewriting a repository can remove unwanted history to make the repository smaller. We recommend git filter-repo over git filter-branch and BFG.

caution
Rewriting repository history is a destructive operation. Make sure to back up your repository before you begin. The best way to back up a repository is to export the project.

Calculate repository size

The size of a repository is determined by computing the accumulated size of all files in the repository. It is similar to executing du --summarize --bytes on your repository’s hashed storage path.

Purge files from repository history

GitLab prunes unreachable objects as part of housekeeping. In GitLab, to reduce the disk size of your repository manually, you must first remove references to large files from branches, tags, and other internal references (refs) created by GitLab. These refs include:

  • refs/merge-requests/*
  • refs/pipelines/*
  • refs/environments/*
  • refs/keep-around/*
note
For details on each of these references, see GitLab-specific references.

These refs are not automatically downloaded and hidden refs are not advertised, but we can remove these refs using a project export.

caution
This process is not suitable for removing sensitive data like password or keys from your repository. Information about commits, including file content, is cached in the database, and remain visible even after they have been removed from the repository.

To purge files from a GitLab repository:

  1. Install git filter-repo and optionally git-sizer using a supported package manager or from source.

  2. Generate a fresh export from the project and download it. This project export contains a backup copy of your repository and refs we can use to purge files from your repository.

    If the full project export fails to complete reliably due to the project size, you can use the Project relations export API to obtain a copy of the repository independently of the other export components.

  3. Decompress the backup using tar:

    tar xzf project-backup.tar.gz
    

    This contains a project.bundle file, which was created by git bundle.

  4. Clone a fresh copy of the repository from the bundle using --bare and --mirror options:

    git clone --bare --mirror /path/to/project.bundle
    
  5. Go to the project.git directory:

    cd project.git
    
  6. Because cloning from a bundle file sets the origin remote to the local bundle file, change it to the URL of your repository:

    git remote set-url origin https://gitlab.example.com/<namespace>/<project_name>.git
    
  7. Using either git filter-repo or git-sizer, analyze your repository and review the results to determine which items you want to purge:

    # Using git filter-repo
    git filter-repo --analyze
    head filter-repo/analysis/*-{all,deleted}-sizes.txt
    
    # Using git-sizer
    git-sizer
    
  8. Purge the history of your repository using relevant git filter-repo options. Two common options are:

    • --path and --invert-paths to purge specific files:

      git filter-repo --path path/to/file.ext --invert-paths
      
    • --strip-blobs-bigger-than to purge all files larger than for example 10M:

      git filter-repo --strip-blobs-bigger-than 10M
      

    See the git filter-repo documentation for more examples and the complete documentation.

  9. Because you are trying to remove internal refs, you need the commit-map files produced by each run to tell you which internal refs to remove. Every git filter-repo run creates a new commit-map, and overwrites the commit-map from the previous run. You can use the following command to back up each commit-map file:

    cp filter-repo/commit-map ./_filter_repo_commit_map_$(date +%s)
    

    Repeat this step and all following steps (including the repository cleanup step) every time you run any git filter-repo command.

  10. To allow you to force push the changes you need to unset the mirror flag:

     git config --unset remote.origin.mirror
    
  11. Force push your changes to overwrite all branches on GitLab:

    git push origin --force 'refs/heads/*'
    

    Protected branches cause this to fail. To proceed, you must remove branch protection, push, and then re-enable protected branches.

  12. To remove large files from tagged releases, force push your changes to all tags on GitLab:

    git push origin --force 'refs/tags/*'
    

    Protected tags cause this to fail. To proceed, you must remove tag protection, push, and then re-enable protected tags.

  13. To prevent dead links to commits that no longer exist, push the refs/replace created by git filter-repo.

    git push origin --force 'refs/replace/*'
    

    Refer to the Git replace documentation for information on how this works.

  14. Wait at least 30 minutes before attempting the next step.
  15. Run repository cleanup. This process only cleans up objects that are more than 30 minutes old. See Space not being freed for more information.

Repository cleanup

Repository cleanup allows you to upload a text file of objects and GitLab removes internal Git references to these objects. You can use git filter-repo to produce a list of objects (in a commit-map file) that can be used with repository cleanup.

Safely cleaning the repository requires it to be made read-only for the duration of the operation. This happens automatically, but submitting the cleanup request fails if any writes are ongoing, so cancel any outstanding git push operations before continuing.

caution
Removing internal Git references results in associated merge request commits, pipelines, and changes details no longer being available.

To clean up a repository:

  1. On the left sidebar, select Search or go to and find your project.
  2. Go to Settings > Repository.
  3. Expand Repository maintenance.
  4. Upload a list of objects. For example, a commit-map file created by git filter-repo which is located in the filter-repo directory.

    If your commit-map file is too large, the background cleanup process might time out and fail. As a result, the repository size isn’t reduced as expected. To address this, split the file and upload it in parts. Start with 20000 and reduce as needed. For example:

    split -l 20000 filter-repo/commit-map filter-repo/commit-map-
    
  5. Select Start cleanup.

This:

  • Removes any internal Git references to old commits.
  • Runs git gc --prune=30.minutes.ago against the repository to remove unreferenced objects. Repacking your repository temporarily causes the size of your repository to increase significantly, because the old packfiles are not removed until the new packfiles have been created.
  • Unlinks any unused LFS objects attached to your project, freeing up storage space.
  • Recalculates the size of your repository on disk.

GitLab sends an email notification with the recalculated repository size after the cleanup has completed.

If the repository size does not decrease, this may be caused by loose objects being kept around because they were referenced in a Git operation that happened in the last 30 minutes. Try re-running these steps after the repository has been dormant for at least 30 minutes.

When using repository cleanup, note:

  • Project statistics are cached. You may need to wait 5-10 minutes to see a reduction in storage utilization.
  • The cleanup prunes loose objects older than 30 minutes. This means objects added or referenced in the last 30 minutes are not removed immediately. If you have access to the Gitaly server, you may skip that delay and run git gc --prune=now to prune all loose objects immediately.
  • This process removes some copies of the rewritten commits from the GitLab cache and database, but there are still numerous gaps in coverage and some of the copies may persist indefinitely. Clearing the instance cache may help to remove some of them, but it should not be depended on for security purposes!

Remove blobs

History

Permanently delete sensitive or confidential information that was accidentally committed, ensuring it’s no longer accessible in your repository’s history.

Alternatively, to replace strings with ***REMOVED***, see Redact text.

Prerequisites:

To remove blobs from your repository:

  1. On the left sidebar, select Search or go to and find your project.
  2. Select Settings > Repository.
  3. Expand Repository maintenance.
  4. Select Remove blobs.
  5. On the drawer, enter a list of blob IDs to remove, each ID on its own line.
  6. Select Remove blobs.
  7. On the confirmation dialog, enter your project path.
  8. Select Yes, remove blobs.
  9. On the left sidebar, select Settings > General.
  10. Expand the section labeled Advanced.
  11. Select Run housekeeping.

Get a list of object IDs

To remove blobs, you need a list of objects to remove. To get these IDs, use the Git ls-tree command.

Prerequisites:

  • You must have the repository cloned to your local machine.

For example, to get a list of files at a given commit or branch sorted by size:

  1. Open a terminal and go to your repository directory.
  2. Run the following command:

    git ls-tree -r -t --long --full-name <COMMIT/BRANCH> | sort -nk 4
    

    Example output:

    100644 blob 8150ee86f923548d376459b29afecbe8495514e9  133508 doc/howto/img/remote-development-new-workspace-button.png
    100644 blob cde4360b3d3ee4f4c04c998d43cfaaf586f09740  214231 doc/howto/img/dependency_proxy_macos_config_new.png
    100644 blob 2ad0e839a709e73a6174e78321e87021b20be445  216452 doc/howto/img/gdk-in-gitpod.jpg
    100644 blob 115dd03fc0828a9011f012abbc58746f7c587a05  242304 doc/howto/img/gitpod-button-repository.jpg
    100644 blob c41ebb321a6a99f68ee6c353dd0ed29f52c1dc80  491158 doc/howto/img/dependency_proxy_macos_config.png
    

    The third column in the output is the object ID of the blob.

Storage limits

Repository size limits:

When a project has reached its size limit, you cannot:

  • Push to the project.
  • Create a new merge request.
  • Merge existing merge requests.
  • Upload LFS objects.

You can still:

  • Create new issues.
  • Clone the project.

If you exceed the repository size limit, you can:

  1. Remove some data.
  2. Make a new commit.
  3. Push back to the repository.

If these actions are insufficient, you can also:

  • Move some blobs to LFS.
  • Remove some old dependency updates from history.

Unfortunately, this workflow doesn’t work. Deleting files in a commit doesn’t actually reduce the size of the repository, because the earlier commits and blobs still exist. Instead, you must rewrite history. You should use the open-source community-maintained tool git filter-repo.

note
Until git gc runs on the GitLab side, the “removed” commits and blobs still exist. You also must be able to push the rewritten history to GitLab, which may be impossible if you’ve already exceeded the maximum size limit.

To lift these restrictions, the Administrator of the self-managed GitLab instance must increase the limit on the particular project that exceeded it. Therefore, it’s always better to proactively stay underneath the limit. If you hit the limit, and can’t have it temporarily increased, your only option is to:

  1. Prune all the unneeded stuff locally.
  2. Create a new project on GitLab and start using that instead.

Troubleshooting

Incorrect repository statistics shown in the GUI

If the displayed size or commit number is different from the exported .tar.gz or local repository, you can ask a GitLab administrator to force an update.

Using the rails console:

p = Project.find_by_full_path('<namespace>/<project>')
pp p.statistics
p.statistics.refresh!
pp p.statistics
# compare with earlier values

# An alternate method to clear project statistics
p.repository.expire_all_method_caches
UpdateProjectStatisticsWorker.perform_async(p.id, ["commit_count","repository_size","storage_size","lfs_objects_size"])

# check the total artifact storage space separately
builds_with_artifacts = p.builds.with_downloadable_artifacts.all

artifact_storage = 0
builds_with_artifacts.find_each do |build|
  artifact_storage += build.artifacts_size
end

puts "#{artifact_storage} bytes"

Space not being freed

The process defined on this page can decrease the size of repository exports decreasing, but the usage in the file system appearing unchanged in both the Web UI and terminal.

The process leaves many unreachable objects remaining in the repository. Because they are unreachable, they are not included in the export, but they are still stored in the file system. These files are pruned after a grace period of two weeks. Pruning deletes these files and ensures your storage usage statistics are accurate.

To expedite this process, see the ‘Prune Unreachable Objects’ housekeeping task.

Sidekiq process fails to export a project

Tier: Free, Premium, Ultimate Offering: Self-managed, GitLab Dedicated

Occasionally the Sidekiq process can fail to export a project, for example if it is terminated during execution.

GitLab.com users should contact Support to resolve this issue.

Self-managed users can use the Rails console to bypass the Sidekiq process and manually trigger the project export:

project = Project.find(1)
current_user = User.find_by(username: 'my-user-name')
RequestStore.begin!
ActiveRecord::Base.logger = Logger.new(STDOUT)
params = {}

::Projects::ImportExport::ExportService.new(project, current_user, params).execute(nil)

This makes the export available through the UI, but does not trigger an email to the user. To manually trigger the project export and send an email:

project = Project.find(1)
current_user = User.find_by(username: 'my-user-name')
RequestStore.begin!
ActiveRecord::Base.logger = Logger.new(STDOUT)
params = {}

ProjectExportWorker.new.perform(current_user.id, project.id)