- Taking a backup
- Backup directory structure
- Backup ID
- Backup metadata file (
backup_information.json
) - Known issues
- What data is backed up?
- What data is NOT backed up?
Back up and Restore GitLab with gitlab-backup-cli
- Introduced in GitLab 17.0. This feature is an experiment and subject to the GitLab Testing Agreement.
This tool is under development and is ultimately meant to replace the Rake tasks used for backing up and restoring GitLab. You can follow the development of this tool in the epic: Next Gen Scalable Backup and Restore.
Feedback on the tool is welcome in the feedback issue.
Taking a backup
To take a backup of the current GitLab installation:
sudo gitlab-backup-cli backup all
Backing up object storage
Only Google cloud is supported. See epic 11577 for the plan to add more vendors.
GCP
gitlab-backup-cli
creates and runs jobs with Google Transfer Service to copy GitLab data to a separate backup bucket.
Prerequisites:
- Follow Google’s documentation for authentication.
- This document assumes you are setting up and using a dedicated Google Cloud service account for managing backups.
- If no other credentials are provided, and you are running inside Google Cloud, then the tool attempts to use the access of the infrastructure it is running on. It is recommended to run with separate credentials, and restrict access to the created backups from the application.
To create a backup:
-
Create a role:
- Create a file
role.yaml
with the following definition:
--- description: Role for backing up GitLab object storage includedPermissions: - storagetransfer.jobs.create - storagetransfer.jobs.get - storagetransfer.jobs.run - storagetransfer.jobs.update - storagetransfer.operations.get - storagetransfer.projects.getServiceAccount stage: GA title: GitLab Backup Role
-
Apply the role:
gcloud iam roles create --project=<YOUR_PROJECT_ID> <ROLE_NAME> --file=role.yaml
- Create a file
-
Create a service account for backups, and add it to the role:
gcloud iam service-accounts create "gitlab-backup-cli" --display-name="GitLab Backup Service Account" # Get the service account email from the output of the following gcloud iam service-accounts list # Add the account to the role created previously gcloud projects add-iam-policy-binding <YOUR_PROJECT_ID> --member="serviceAccount:<SERVICE_ACCOUNT_EMAIL>" --role="roles/<ROLE_NAME>"
- Follow Google’s documentation for authentication with the service account. In general, the credentials can be saved to a file, or stored in a predefined environment variable.
- Create a destination bucket to backup to in Google Cloud Storage. The options here are highly dependent on your requirements.
-
Run the backup:
sudo gitlab-backup-cli backup all --backup-bucket=<BUCKET_NAME>
If you want to backup the container registry bucket, add the option
--registry-bucket=<REGISTRY_BUCKET_NAME>
. - The backup creates a backup under
backups/<BACKUP_ID>/<BUCKET>
for each of the object storage types in the bucket.
Backup directory structure
Example backup directory structure:
backups
└── 1714053314_2024_04_25_17.0.0-pre
├── artifacts.tar.gz
├── backup_information.json
├── builds.tar.gz
├── ci_secure_files.tar.gz
├── db
│ ├── ci_database.sql.gz
│ └── database.sql.gz
├── lfs.tar.gz
├── packages.tar.gz
├── pages.tar.gz
├── registry.tar.gz
├── repositories
│ ├── default
│ │ ├── @hashed
│ │ └── @snippets
│ └── manifests
│ └── default
├── terraform_state.tar.gz
└── uploads.tar.gz
The db
directory is used to back up the GitLab PostgreSQL database using pg_dump
to create an SQL dump. The output of pg_dump
is piped through gzip
in order to create a compressed SQL file.
The repositories
directory is used to back up Git repositories, as found in the GitLab database.
Backup ID
Backup IDs identify individual backups. You need the backup ID of a backup archive if you need to restore GitLab and multiple backups are available.
Backups are saved in a directory set in backup_path
, which is specified in the config/gitlab.yml
file.
- By default, backups are stored in
/var/opt/gitlab/backups
. - By default, backup directories are named after
backup_id
’s where<backup-id>
identifies the time when the backup was created and the GitLab version.
For example, if the backup directory name is 1714053314_2024_04_25_17.0.0-pre
, the time of creation is represented by 1714053314_2024_04_25
and the GitLab version is 17.0.0-pre.
Backup metadata file (backup_information.json
)
- Metadata version 2 was introduced in GitLab 16.11.
backup_information.json
is found in the backup directory, and it stores metadata about the backup. For example:
{
"metadata_version": 2,
"backup_id": "1714053314_2024_04_25_17.0.0-pre",
"created_at": "2024-04-25T13:55:14Z",
"gitlab_version": "17.0.0-pre"
}
Known issues
When working with gitlab-backup-cli
, you might encounter the following issues.
Architecture compatibility
If you use the gitlab-backup-cli
tool on architectures other than the 1K architecture, you might experience issues. This tool is supported only on 1K architecture and is recommended only for relevant environments.
Backup strategy
Changes to existing files during backup might cause issues on the GitLab instance. This issue occurs because the tool’s initial version does not use the copy strategy.
A workaround of this issue, is either to:
- Transition the GitLab instance into Maintenance Mode.
- Restrict traffic to the servers during backup to preserve instance resources.
We’re investigating an alternative to the copy strategy, see issue 428520.
What data is backed up?
- Git Repository Data
- Databases
- Blobs
What data is NOT backed up?
-
Secrets and Configurations
- Follow the documentation on how to backup secrets and configuration.
-
Transient and Cache Data
- Redis: Cache
- Redis: Sidekiq Data
- Logs
- Elasticsearch
- Observability Data / Prometheus Metrics