Troubleshooting file export project migrations

Tier: Free, Premium, Ultimate
Offering: GitLab.com, GitLab Self-Managed, GitLab Dedicated

If you have problems with migrating projects by using file exports, see the possible solutions below.

Troubleshooting commands

Finds information about the status of the import and further logs using the JID, using the Rails console:

Project.find_by_full_path('group/project').import_state.slice(:jid, :status, :last_error)
> {"jid"=>"414dec93f941a593ea1a6894", "status"=>"finished", "last_error"=>nil}

# Logs
grep JID /var/log/gitlab/sidekiq/current
grep "Import/Export error" /var/log/gitlab/sidekiq/current
grep "Import/Export backtrace" /var/log/gitlab/sidekiq/current
tail /var/log/gitlab/gitlab-rails/importer.log

Project fails to import due to mismatch

If the instance runners enablement does not match between the exported project, and the project import, the project fails to import. Review issue 276930, and either:

Ensure instance runners are enabled in both the source and destination projects.
Disable instance runners on the parent group when you import the project.

Users missing from imported project

If users aren’t imported with imported projects, see the preserving user contributions requirements.

A common reason for missing users is that the public email setting isn’t configured for users. To resolve this issue, ask users to configure this setting using the GitLab UI.

If there are too many users for manual configuration to be feasible, you can set all user profiles to use a public email address using the Rails console:

User.where("public_email IS NULL OR public_email = '' ").find_each do |u|
  next if u.bot?

  puts "Setting #{u.username}'s currently empty public email to #{u.email}…"
  u.public_email = u.email
  u.save!
end

Import workarounds for large repositories

Maximum import size limitations can prevent an import from being successful. If changing the import limits is not possible, you can try one of the workarounds listed here.

Workaround option 1

The following local workflow can be used to temporarily reduce the repository size for another import attempt:

Create a temporary working directory from the export:

EXPORT=<filename-without-extension>

mkdir "$EXPORT"
tar -xf "$EXPORT".tar.gz --directory="$EXPORT"/
cd "$EXPORT"/
git clone project.bundle

# Prevent interference with recreating an importable file later
mv project.bundle ../"$EXPORT"-original.bundle
mv ../"$EXPORT".tar.gz ../"$EXPORT"-original.tar.gz

git switch --create smaller-tmp-main

To reduce the repository size, work on this smaller-tmp-main branch: identify and remove large files or interactively rebase and fixup to reduce the number of commits.

# Reduce the .git/objects/pack/ file size
cd project
git reflog expire --expire=now --all
git gc --prune=now --aggressive

# Prepare recreating an importable file
git bundle create ../project.bundle <default-branch-name>
cd ..
mv project/ ../"$EXPORT"-project
cd ..

# Recreate an importable file
tar -czf "$EXPORT"-smaller.tar.gz --directory="$EXPORT"/ .

Import this new, smaller file into GitLab.
In a full clone of the original repository, use git remote set-url origin <new-url> && git push --force --all to complete the import.
Update the imported repository’s branch protection rules and its default branch, and delete the temporary, smaller-tmp-main branch, and the local, temporary data.

Workaround option 2

This workaround does not account for LFS objects.

Rather than attempting to push all changes at once, this workaround:

Separates the project import from the Git Repository import
Incrementally pushes the repository to GitLab

Make a local clone of the repository to migrate. In a later step, you push this clone outside of the project export.
Download the export and remove the project.bundle (which contains the Git repository):
```
tar -czvf new_export.tar.gz --exclude='project.bundle' @old_export.tar.gz
```
Import the export without a Git repository. It asks you to confirm to import without a repository.

Save this bash script as a file and run it after adding the appropriate origin.

#!/bin/sh

# ASSUMPTIONS:
# - The GitLab location is "origin"
# - The default branch is "main"
# - This will attempt to push in chunks of 500 MB (dividing the total size by 500 MB).
#   Decrease this size to push in smaller chunks if you still receive timeouts.

git gc
SIZE=$(git count-objects -v 2> /dev/null | grep size-pack | awk '{print $2}')

# Be conservative and try to push 2 GB at a time
# (given this assumes each commit is the same size - which is wrong)
BATCHES=$(($SIZE / 500000))
TOTAL_COMMITS=$(git rev-list --count HEAD)
if (( BATCHES > TOTAL_COMMITS )); then
    BATCHES=$TOTAL_COMMITS
fi

INCREMENTS=$(( ($TOTAL_COMMITS / $BATCHES) - 1 ))

for (( BATCH=BATCHES; BATCH>=1; BATCH-- ))
do
  COMMIT_NUM=$(( $BATCH - $INCREMENTS ))
  COMMIT_SHA=$(git log -n $COMMIT_NUM --format=format:%H | tail -1)
  git push -u origin ${COMMIT_SHA}:refs/heads/main
done
git push -u origin main
git push -u origin --all
git push -u origin --tags

Sidekiq process fails to export a project

Occasionally the Sidekiq process can fail to export a project, for example if it is terminated during execution.

GitLab.com users should contact Support to resolve this issue.

GitLab Self-Managed administrators can use the Rails console to bypass the Sidekiq process and manually trigger the project export:

project = Project.find(1)
current_user = User.find_by(username: 'my-user-name')
RequestStore.begin!
ActiveRecord::Base.logger = Logger.new(STDOUT)
params = {}

::Projects::ImportExport::ExportService.new(project, current_user, params).execute(nil)

This makes the export available through the UI, but does not trigger an email to the user. To manually trigger the project export and send an email:

project = Project.find(1)
current_user = User.find_by(username: 'my-user-name')
RequestStore.begin!
ActiveRecord::Base.logger = Logger.new(STDOUT)
params = {}

ProjectExportWorker.new.perform(current_user.id, project.id)

Manually execute export steps

You usually export a project through the web interface or through the API. Exporting using these methods can sometimes fail without giving enough information to troubleshoot. In these cases, open a Rails console session and loop through all the defined exporters. Execute each line individually, rather than pasting the entire block at once, so you can see any errors each command returns.

# User needs to have permission to export
u = User.find_by_username('someuser')
p = Project.find_by_full_path('some/project')
e = Projects::ImportExport::ExportService.new(p,u)

e.send(:version_saver).send(:save)
e.send(:repo_saver).send(:save)
e.send(:avatar_saver).send(:save)
e.send(:project_tree_saver).send(:save)
e.send(:uploads_saver).send(:save)
e.send(:wiki_repo_saver).send(:save)
e.send(:lfs_saver).send(:save)
e.send(:snippets_repo_saver).send(:save)
e.send(:design_repo_saver).send(:save)
## continue using `e.send(:exporter_name).send(:save)` going through the list of exporters

# The following line should show you the export_path similar to /var/opt/gitlab/gitlab-rails/shared/tmp/gitlab_exports/@hashed/49/94/4994....
s = Gitlab::ImportExport::Saver.new(exportable: p, shared: p.import_export_shared, user: u)

# Prior to GitLab 17.0, the `user` parameter was not supported. If you encounter an
# error with the above or are unsure whether or not to supply the `user`
# argument, use the following check:
Gitlab::ImportExport::Saver.instance_method(:initialize).parameters.include?([:keyreq, :user])
# If the preceding check returns false, omit the user argument:
s = Gitlab::ImportExport::Saver.new(exportable: p, shared: p.import_export_shared)

# To try and upload use:
s.send(:compress_and_save)
s.send(:save_upload)

After the project is successfully uploaded, the exported project is located in a .tar.gz file in /var/opt/gitlab/gitlab-rails/uploads/-/system/import_export_upload/export_file/.

Error: `PG::QueryCanceled: ERROR: canceling statement due to statement timeout`

Some migrations can time out with the error: PG::QueryCanceled: ERROR: canceling statement due to statement timeout. One way to avoid this problem is to have the migration batch size reduced. This makes a migration less likely to time out, but makes migrations slower.

To have the batch sized reduced, you must have a feature flag enabled. For more information, see issue 456948.

Error: `command exited with error code 15 and Unable to save [FILTERED] into [FILTERED]`

You might get the following error in logs when you migrate projects by using file exports:

command exited with error code 15 and Unable to save [FILTERED] into [FILTERED]

This error occurs during export or import when Sidekiq receives a SIGTERM, often while executing the tar command.

In Kubernetes environments like GitLab.com and GitLab Dedicated, the operating system triggers SIGTERM signals due to memory or disk shortage, code deployments, or instance upgrades. To identify the root cause, an administrator should investigate why Kubernetes terminated the instance.

In non-Kubernetes environments, this error might occur if the instance is terminated while executing the tar command. However, this error does not occur due to disk shortage, so memory shortage is the most likely cause.

If you get this error:

When you export a file, GitLab retries the export until the maximum number of retries is reached and then marks the export as failed. For GitLab.com, try the export during the weekend when less load exists on the instance.
When you import a file, you must retry the import yourself. GitLab does not retry the import automatically.

Troubleshooting performance issues

Read through the current performance problems using the Import/Export below.

OOM errors

Out of memory (OOM) errors are usually caused by the Sidekiq Memory Killer:

SIDEKIQ_MEMORY_KILLER_MAX_RSS = 2000000
SIDEKIQ_MEMORY_KILLER_HARD_LIMIT_RSS = 3000000
SIDEKIQ_MEMORY_KILLER_GRACE_TIME = 900

An import status started, and the following Sidekiq logs signal a memory issue:

WARN: Work still in progress <struct with JID>

Timeouts

Timeout errors occur due to the Gitlab::Import::StuckProjectImportJobsWorker marking the process as failed:

module Gitlab
  module Import
    class StuckProjectImportJobsWorker
      include Gitlab::Import::StuckImportJob
      # ...
    end
  end
end

module Gitlab
  module Import
    module StuckImportJob
      # ...
      IMPORT_JOBS_EXPIRATION = 15.hours.to_i
      # ...
      def perform
        stuck_imports_without_jid_count = mark_imports_without_jid_as_failed!
        stuck_imports_with_jid_count = mark_imports_with_jid_as_failed!

        track_metrics(stuck_imports_with_jid_count, stuck_imports_without_jid_count)
      end
      # ...
    end
  end
end

Marked stuck import jobs as failed. JIDs: xyz

  +-----------+    +-----------------------------------+
  |Export Job |--->| Calls ActiveRecord `as_json` and  |
  +-----------+    | `to_json` on all project models   |
                   +-----------------------------------+

  +-----------+    +-----------------------------------+
  |Import Job |--->| Loads all JSON in memory, then    |
  +-----------+    | inserts into the DB in batches    |
                   +-----------------------------------+

Problems and solutions

Slow JSON loading/dumping models from the database:

split the worker
Batch export
Optimize SQL
Move away from ActiveRecord callbacks (difficult)

High memory usage (see also some analysis):

DB Commit sweet spot that uses less memory
Netflix Fast JSON API may help
Batch reading/writing to disk and any SQL