Troubleshooting

UPGRADE FAILED: Job failed: BackoffLimitExceeded

If you received this error when upgrading to the 6.0 version of the chart, then it’s probably because you didn’t follow the right upgrade path, as you first need to upgrade to the latest 5.10.x version:

  1. List all your releases to identify your GitLab Helm release name (you will need to include -n <namespace> if your release was not deployed to the default K8s namespace):

     helm ls 
    
  2. Assuming that your GitLab Helm release is called gitlab you then need to look at the release history and identify the last successful revision (you can see the status of a revision under DESCRIPTION):

     helm history gitlab
    
  3. Assuming your most recent successful revision is 1 use this command to roll back:

    helm rollback gitlab 1
    
  4. Re-run the upgrade command by replacing <x> with the appropriate chart version:

    helm upgrade --version=5.10.<x>
    
  5. At this point you can use the --version option to pass a specific 6.x.x chart version or remove the option for upgrading to the latest version of GitLab:

    helm upgrade --install gitlab gitlab/gitlab <other_options>
    

More information about command line arguments can be found in our Deploy using Helm section. For mappings between chart versions and GitLab versions, read GitLab version mappings.

UPGRADE FAILED: “$name” has no deployed releases

This error occurs on your second install/upgrade if your initial install failed.

If your initial install completely failed, and GitLab was never operational, you should first purge the failed install before installing again.

helm uninstall <release-name>

If instead, the initial install command timed out, but GitLab still came up successfully, you can add the --force flag to the helm upgrade command to ignore the error and attempt to update the release.

Otherwise, if you received this error after having previously had successful deploys of the GitLab chart, then you are encountering a bug. Please open an issue on our issue tracker, and also check out issue #630 where we recovered our CI server from this problem.

Error: this command needs 2 arguments: release name, chart path

An error like this could occur when you run helm upgrade and there are some spaces in the parameters. In the following example, Test Username is the culprit:

helm upgrade gitlab gitlab/gitlab --timeout 600s --set global.email.display_name=Test Username ...

To fix it, pass the parameters in single quotes:

helm upgrade gitlab gitlab/gitlab --timeout 600s --set global.email.display_name='Test Username' ...

Application containers constantly initializing

If you experience Sidekiq, Webservice, or other Rails based containers in a constant state of Initializing, you’re likely waiting on the dependencies container to pass.

If you check the logs of a given Pod specifically for the dependencies container, you may see the following repeated:

Checking database connection and schema version
WARNING: This version of GitLab depends on gitlab-shell 8.7.1, ...
Database Schema
Current version: 0
Codebase version: 20190301182457

This is an indication that the migrations Job has not yet completed. The purpose of this Job is to both ensure that the database is seeded, as well as all relevant migrations are in place. The application containers are attempting to wait for the database to be at or above their expected database version. This is to ensure that the application does not malfunction to the schema not matching expectations of the codebase.

  1. Find the migrations Job. kubectl get job -lapp=migrations
  2. Find the Pod being run by the Job. kubectl get pod -ljob-name=<job-name>
  3. Examine the output, checking the STATUS column.

If the STATUS is Running, continue. If the STATUS is Completed, the application containers should start shortly after the next check passes.

Examine the logs from this pod. kubectl logs <pod-name>

Any failures during the run of this job should be addressed. These will block the use of the application until resolved. Possible problems are:

  • Unreachable or failed authentication to the configured PostgreSQL database
  • Unreachable or failed authentication to the configured Redis services
  • Failure to reach a Gitaly instance

Applying configuration changes

The following command will perform the necessary operations to apply any updates made to gitlab.yaml:

helm upgrade <release name> <chart path> -f gitlab.yaml

Included GitLab Runner failing to register

This can happen when the runner registration token has been changed in GitLab. (This often happens after you have restored a backup)

  1. Find the new shared runner token located on the admin/runners webpage of your GitLab installation.
  2. Find the name of existing runner token Secret stored in Kubernetes

    kubectl get secrets | grep gitlab-runner-secret
    
  3. Delete the existing secret

    kubectl delete secret <runner-secret-name>
    
  4. Create the new secret with two keys, (runner-registration-token with your shared token, and an empty runner-token)

    kubectl create secret generic <runner-secret-name> --from-literal=runner-registration-token=<new-shared-runner-token> --from-literal=runner-token=""
    

Too many redirects

This can happen when you have TLS termination before the NGINX Ingress, and the tls-secrets are specified in the configuration.

  1. Update your values to set global.ingress.annotations."nginx.ingress.kubernetes.io/ssl-redirect": "false"

    Via a values file:

    # values.yaml
    global:
      ingress:
        annotations:
          "nginx.ingress.kubernetes.io/ssl-redirect": "false"
    

    Via the Helm CLI:

    helm ... --set-string global.ingress.annotations."nginx.ingress.kubernetes.io/ssl-redirect"=false
    
  2. Apply the change.

note
When using an external service for SSL termination, that service is responsible for redirecting to https (if so desired).

Upgrades fail with Immutable Field Error

spec.clusterIP

Prior to the 3.0.0 release of these charts, the spec.clusterIP property had been populated into several Services despite having no actual value (""). This was a bug, and causes problems with Helm 3’s three-way merge of properties.

Once the chart was deployed with Helm 3, there would be no possible upgrade path unless one collected the clusterIP properties from the various Services and populated those into the values provided to Helm, or the affected services are removed from Kubernetes.

The 3.0.0 release of this chart corrected this error, but it requires manual correction.

This can be solved by simply removing all of the affected services.

  1. Remove all affected services:

    kubectl delete services -lrelease=RELEASE_NAME
    
  2. Perform an upgrade via Helm.
  3. Future upgrades will not face this error.
note
This will change any dynamic value for the LoadBalancer for NGINX Ingress from this chart, if in use. See global Ingress settings documentation for more details regarding externalIP. You may be required to update DNS records!

spec.selector

Sidekiq pods did not receive a unique selector prior to chart release 3.0.0. The problems with this were documented in.

Upgrades to 3.0.0 using Helm will automatically delete the old Sidekiq deployments and create new ones by appending -v1 to the name of the Sidekiq Deployments,HPAs, and Pods.

Starting from 5.5.0 Helm will delete old Sidekiq deployments from prior versions and will use -v2 suffix for Pods, Deployments and HPAs.

If you continue to run into this error on the Sidekiq deployment when installing 3.0.0, resolve these with the following steps:

  1. Remove Sidekiq services

    kubectl delete deployment --cascade -lrelease=RELEASE_NAME,app=sidekiq
    
  2. Perform an upgrade via Helm.

cannot patch “RELEASE-NAME-cert-manager” with kind Deployment

Upgrading from CertManager version 0.10 introduced a number of breaking changes. The old Custom Resource Definitions must be uninstalled and removed from Helm’s tracking and then re-installed.

The Helm chart attempts to do this by default but if you encounter this error you may need to take manual action.

If this error message was encountered, then upgrading requires one more step than normal in order to ensure the new Custom Resource Definitions are actually applied to the deployment.

  1. Remove the old CertManager Deployment.

     kubectl delete deployments -l app=cert-manager --cascade
    
  2. Run the upgrade again. This time install the new Custom Resource Definitions

     helm upgrade --install --values - YOUR-RELEASE-NAME gitlab/gitlab < <(helm get values YOUR-RELEASE-NAME)
    

cannot patch gitlab-kube-state-metrics with kind Deployment

Upgrading from Prometheus version 11.16.9 to 15.0.4 changes the selector labels used on the kube-state-metrics Deployment, which is disabled by default (prometheus.kubeStateMetrics.enabled=false).

If this error message is encountered, meaning prometheus.kubeStateMetrics.enabled=true, then upgrading requires an additional step:

  1. Remove the old kube-state-metrics Deployment.

    kubectl delete deploym