Backup and restore a GitLab instance
GitLab Helm chart provides a utility pod from the Toolbox sub-chart that acts as an interface for the purpose of backing up and restoring GitLab instances. It is equipped with a backup-utility
executable which interacts with other necessary pods for this task.
Technical details for how the utility works can be found in the architecture documentation.
Prerequisites
-
Backup and Restore procedures described here have only been tested with S3 compatible APIs. Support for other object storage services, like Google Cloud Storage, will be tested in future revisions.
-
During restoration, the backup tarball needs to be extracted to disk. This means the Toolbox pod should have disk of necessary size available.
-
This chart relies on the use of object storage for
artifacts
,uploads
,packages
,registry
andlfs
objects, and does not currently migrate these for you during restore. If you are restoring a backup taken from another instance, you must migrate your existing instance to using object storage before taking the backup. See issue 646.
Backup and Restoring procedures
Object storage
We provide a MinIO instance out of the box when using this charts unless an external object storage is specified. The Toolbox connects to the included MinIO by default, unless specific settings are given. The Toolbox can also be configured to back up to Amazon S3 or Google Cloud Storage (GCS).
Backups to S3
The Toolbox uses s3cmd
by default to connect to object storage unless you specify another s3 tool to use. In order to configure connectivity to external object storage gitlab.toolbox.backups.objectStorage.config.secret
should be specified which points to a Kubernetes secret containing a .s3cfg
file. gitlab.toolbox.backups.objectStorage.config.key
should be specified if different from the default of config
. This points to the key containing the contents of a .s3cfg
file.
It should look like this:
helm install gitlab gitlab/gitlab \
--set gitlab.toolbox.backups.objectStorage.config.secret=my-s3cfg \
--set gitlab.toolbox.backups.objectStorage.config.key=config .
s3cmd .s3cfg
file documentation can be found here
In addition, two bucket locations need to be configured, one for storing the backups, and one temporary bucket that is used when restoring a backup.
--set global.appConfig.backups.bucket=gitlab-backup-storage
--set global.appConfig.backups.tmpBucket=gitlab-tmp-storage
Backups to Google Cloud Storage (GCS)
To backup to GCS, you must first set gitlab.toolbox.backups.objectStorage.backend
to gcs
. This ensures
that the Toolbox uses the gsutil
CLI when storing and retrieving
objects.
In addition, two bucket locations need to be configured, one for storing the backups, and one temporary bucket that is used when restoring a backup.
--set global.appConfig.backups.bucket=gitlab-backup-storage
--set global.appConfig.backups.tmpBucket=gitlab-tmp-storage
The backup utility needs access to these buckets. There are two ways to grant access:
- Specifying credentials in a Kubernetes secret.
- Configuring Workload Identity Federation for GKE.
GCS credentials
First, set gitlab.toolbox.backups.objectStorage.config.gcpProject
to the project ID of the GCP project that contains your storage buckets.
You must create a Kubernetes secret with the contents of an active service account JSON key where the service account has the storage.admin
role for the buckets
you will use for backup. Below is an example of using the gcloud
and kubectl
to create the secret.
export PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create gitlab-gcs --display-name "Gitlab Cloud Storage"
gcloud projects add-iam-policy-binding --role roles/storage.admin ${PROJECT_ID} --member=serviceAccount:gitlab-gcs@${PROJECT_ID}.iam.gserviceaccount.com
gcloud iam service-accounts keys create --iam-account gitlab-gcs@${PROJECT_ID}.iam.gserviceaccount.com storage.config
kubectl create secret generic storage-config --from-file=config=storage.config
Configure your Helm chart as follows to use the service account key to authenticate to GCS for backups:
helm install gitlab gitlab/gitlab \
--set gitlab.toolbox.backups.objectStorage.config.secret=storage-config \
--set gitlab.toolbox.backups.objectStorage.config.key=config \
--set gitlab.toolbox.backups.objectStorage.config.gcpProject=my-gcp-project-id \
--set gitlab.toolbox.backups.objectStorage.backend=gcs
Configuring Workload Identity Federation for GKE
See the documentation on Workload Identity Federation for GKE using the GitLab chart.
When creating an IAM allow policy that references the Kubernetes ServiceAccount, grant the roles/storage.objectAdmin
role.
For backups, ensure that Google’s Application Default Credentials are used by making sure that
gitlab.toolbox.backups.objectStorage.config.secret
and gitlab.toolbox.backups.objectStorage.config.key
are NOT set.
Backups to Azure blob storage
Azure blob storage can be used to store backups by setting
gitlab.toolbox.backups.objectStorage.backend
to azure
. This enables
Toolbox to use the included copy of azcopy
to transmit and retrieve the
backup files to the Azure blob storage.
To use Azure blob storage, one will need to create a storage account in an existing resource group. Create a config secret with your storage account’s name, access key and blob host.
Create a config file containing the paramters:
# azure-backup-conf.yaml
azure_storage_account_name: <storage account>
azure_storage_access_key: <access key value>
azure_storage_domain: blob.core.windows.net # optional
The following kubectl
command can be used to create the Kubernetes Secret:
kubectl create secret generic backup-azure-creds \
--from-file=config=azure-backup-conf.yaml
Once the Secret has been created, the GitLab Helm chart can be configured by adding the backup settings to your deployed values or by supplying the settings on the Helm command line. For example:
helm install gitlab gitlab/gitlab \
--set gitlab.toolbox.backups.objectStorage.config.secret=backup-azure-creds \
--set gitlab.toolbox.backups.objectStorage.config.key=config \
--set gitlab.toolbox.backups.objectStorage.backend=azure
The access key from the Secret is used to generate and refresh shorter-lived shared access signature (SAS) tokens to access the storage account.
In addition, two buckets/containers need to be created beforehand, one for storing the backups, and one temporary bucket that is used when restoring a backup. Add the bucket names to your values or settings. For example:
--set global.appConfig.backups.bucket=gitlab-backup-storage
--set global.appConfig.backups.tmpBucket=gitlab-tmp-storage
Troubleshooting
Pod eviction issues
As the backups are assembled locally outside of the object storage target, temporary disk space is needed. The required space might exceed the size of the actual backup archive. The default configuration will use the Toolbox pod’s file system to store the temporary data. If you find pod being evicted due to low resources, you should attach a persistent volume to the pod to hold the temporary data. On GKE, add the following settings to your Helm command:
--set gitlab.toolbox.persistence.enabled=true
If your backups are being run as part of the included backup cron job, then you will want to enable persistence for the cron job as well:
--set gitlab.toolbox.backups.cron.persistence.enabled=true
For other providers, you may need to create a persistent volume. See our Storage documentation for possible examples on how to do this.
“Bucket not found” errors
If you see Bucket not found
errors during backups, check the
credentials are configured for your bucket.
The command depends on the cloud service provider:
-
For AWS S3, the credentials are stored on the toolbox pod in
~/.s3cfg
. Run:s3cmd ls
-
For GCP GCS, run:
gsutil ls
You should see a list of available buckets.
“AccessDeniedException: 403” errors in GCP
An error like [Error] AccessDeniedException: 403 <GCP Account> does not have storage.objects.list access to the Google Cloud Storage bucket.
usually happens during a backup or restore of a GitLab instance, because of missing permissions.
The backup and restore operations use all buckets in the environment, so confirm that all buckets in your environment have been created, and that the GCP account can access (list, read, and write) all buckets:
-
Find your toolbox pod:
kubectl get pods -lrelease=RELEASE_NAME,app=toolbox
-
Get all buckets in the pod’s environment. Replace
<toolbox-pod-name>
with your actual toolbox pod name, but leave"BUCKET_NAME"
as it is:kubectl describe pod <toolbox-pod-name> | grep "BUCKET_NAME"
-
Confirm that you have access to every bucket in the environment:
# List gsutil ls gs://<bucket-to-validate>/ # Read gsutil cp gs://<bucket-to-validate>/<object-to-get> <save-to-location> # Write gsutil cp -n <local-file> gs://<bucket-to-validate>/
“ERROR: /home/git/.s3cfg
: None” error when running backup-utility
with --backend s3
This error happens when a Kubernetes secret containing a .s3cfg
file was not specified through the gitlab.toolbox.backups.objectStorage.config.secret
value.
To fix this, follow the instructions in backups to S3.
“PermissionError: File not writable” errors using S3
An error like [Error] WARNING: <file> not writable: Operation not permitted
happens if the toolbox user does not have
permissions to write files that match the stored permissions of the bucket items.
To prevent this, configure s3cmd
not to preserve file owner, mode and timestamps by adding the
following flag to your .s3cfg
file referenced via gitlab.toolbox.backups.objectStorage.config.secret
.
preserve_attrs = False
Repositories skipped on restore
Starting with GitLab 16.6/Chart 7.6 repositories may be skipped on restore if the backup archive has been renamed.
To avoid this, do not rename backup archives and rename backups to their original names ({backup_id}_gitlab_backup.tar
).
The original backup ID can be extracted from the repository backup directory structure: repositories/@hashed/*/*/*/{backup_id}/LATEST