Troubleshooting

All steps noted here are for DEVELOPMENT ENVIRONMENTS ONLY. Administrators may find the information insightful, but the outlined fixes are destructive and would have a major negative impact on production systems.

Passwords and secrets failing or unsynchronized

Developers commonly deploy, delete, and re-deploy a release into the same cluster multiple times. Kubernetes secrets and persistent volume claims created by StatefulSets are intentionally not removed by helm delete RELEASE_NAME.

Removing only the Kubernetes secrets leads to interesting problems. For example, a new deployment’s migration pod will fail because GitLab Rails cannot connect to the database because it has the wrong password.

To completely wipe a release from a development environment including secrets, a developer must remove both the secrets and the persistent volume claims.

# DO NOT run these commands in a production environment. Disaster will strike.
kubectl delete secrets,pvc -lrelease=RELEASE_NAME
note
This deletes all Kubernetes secrets including TLS certificates and all data in the database. This should not be performed in a production instance.

Database is broken and needs reset

The database environment can be reset in a development environment by:

  1. Delete the PostgreSQL StatefulSet
  2. Delete the PostgreSQL PersistentVolumeClaim
  3. Deploy GitLab again with helm upgrade --install
note
This will delete all data in the databases and should not be run in production.

Backup used for testing needs to be updated

Certain jobs in CI use a backup of GitLab during testing. Complete the steps below to update this backup when needed:

  1. Generate the desired backup by running a CI pipeline for the matching stable branch.
    1. For example: run a CI pipeline for branch 5-4-stable if current release is 5-5-stable to create a backup of 14.4.
    2. Note that this will require the Maintainer role.
  2. In that pipeline, cancel the QA jobs (but leave the spec tests) so that we don’t get extra data in the backup.
  3. Let the spec tests finish. They will have installed the old backup, and migrated the instance to the version we want.
  4. Edit the gitlab-runner Deployment replicas to 0, so the Runner turns off.
  5. Log in to the UI and delete the Runner from the admin section. This should help avoid cipher errors later.
  6. Ensure the background migrations all complete, forcing them to complete if needed.
  7. Delete the toolbox Pod to ensure there is no existing tmp data, keeping the backup small.
  8. If any manual work is needed to modify the contents of the backup, complete it before moving on to the next step.
  9. Create a new backup from the new toolbox Pod.
  10. Download the new backup from the CI instance of MinIO in the gitlab-backups bucket.
  11. Rename and upload the backup to the proper location in Google Cloud Storage (GCS):
    1. Project: cloud-native-182609, path: gitlab-charts-ci/test-backups/
    2. Name format: $VERSION_gitlab_backup.tar (example: 14.4.2_gitlab_backup.tar)
    3. Edit access and add Entity=Public, Name=allUsers, and Access=Reader.
  12. Finally, update .variables.TEST_BACKUP_PREFIX in .gitlab-ci.yml to the new version of the backup.

Future pipelines will now use the new backup artifact during testing.