- Forked version of Docker machine
- Preparing the environment
- Configuring GitLab Runner
- Upgrading GitLab Runner
- Using the forked version of Docker Machine
- Troubleshooting
Install and register GitLab Runner for autoscaling with Docker Machine
- The autoscaling feature was introduced in GitLab Runner 1.1.0.
For an overview of the autoscale architecture, take a look at the comprehensive documentation on autoscaling.
Forked version of Docker machine
Docker has deprecated Docker Machine. However,
GitLab maintains a Docker Machine fork
for GitLab Runner users who rely on the Docker Machine executor. This fork is
based on the latest main
branch of docker-machine
with
some additional patches for the following bugs:
- Make DigitalOcean driver RateLimit aware
- Add backoff to Google driver operations check
- Add
--google-min-cpu-platform
option for machine creation - Use cached IP for Google driver
- Use cached IP for AWS driver
- Add support for using GPUs in Google Compute Engine
- Support running AWS instances with IMDSv2
The intent of the Docker Machine fork is to only fix critical issues and bugs which affect running costs. No new features will be added.
Preparing the environment
To use the autoscale feature, Docker and GitLab Runner must be installed in the same machine:
- Log in to a new Linux-based machine that will serve as a bastion server where Docker will spawn new machines from.
- Install GitLab Runner.
- Install Docker Machine from the Docker Machine fork.
- Optionally but recommended, prepare a proxy container registry and a cache server to be used with the autoscaled runners.
Configuring GitLab Runner
- Familiarize yourself with the core concepts of using
docker-machine
withgitlab-runner
: -
The first time you’re using Docker Machine, it is best to manually execute the
docker-machine create ...
command with your Docker Machine Driver. Run this command with the options that you intend to configure in the MachineOptions under the[runners.machine]
section. This will set up the Docker Machine environment properly and will also be a good validation of the specified options. After this, you can destroy the machine withdocker-machine rm [machine_name]
and start the runner.Multiple concurrent requests todocker-machine create
that are done at first usage are not good. When thedocker+machine
executor is used, the runner may spin up few concurrentdocker-machine create
commands. If Docker Machine was not used before in this environment, each started process tries to prepare SSH keys and SSL certificates (for Docker API authentication between GitLab Runner and Docker Engine on the autoscaled spawned machine), and these concurrent processes are disturbing each other. This can end with a non-working environment. That’s why it’s important to create a test machine manually the very first time you set up GitLab Runner with Docker Machine. -
Register a runner and select the
docker+machine
executor when asked. - Edit
config.toml
and configure the runner to use Docker machine. Visit the dedicated page covering detailed information about GitLab Runner Autoscaling. - Now, you can try and start a new pipeline in your project. In a few seconds,
if you run
docker-machine ls
you should see a new machine being created.
Upgrading GitLab Runner
- Check if your operating system is configured to automatically restart GitLab
Runner (for example, by checking its service file):
-
if yes, ensure that service manager is configured to use
SIGQUIT
and use the service’s tools to stop the process:# For systemd sudo systemctl stop gitlab-runner # For upstart sudo service gitlab-runner stop
-
if no, you may stop the process manually:
sudo killall -SIGQUIT gitlab-runner
Sending theSIGQUIT
signal makes the process stop gracefully. The process stops accepting new jobs, and exits as soon as the current jobs are finished. -
-
Wait until GitLab Runner exits. You can check its status with
gitlab-runner status
or await a graceful shutdown for up to 30 minutes with:for i in `seq 1 180`; do # 1800 seconds = 30 minutes gitlab-runner status || break sleep 10 done
- You can now safely install the new version of GitLab Runner without interrupting any jobs.
Using the forked version of Docker Machine
Install
-
Download the appropriate
docker-machine
binary. Copy the binary to a location accessible toPATH
and make it executable. For example, to download and installv0.16.2-gitlab.29
:curl -O "https://gitlab-docker-machine-downloads.s3.amazonaws.com/v0.16.2-gitlab.29/docker-machine-Linux-x86_64" cp docker-machine-Linux-x86_64 /usr/local/bin/docker-machine chmod +x /usr/local/bin/docker-machine
Using GPUs on Google Compute Engine
-
Introduced in GitLab Docker Machine
0.16.2-gitlab.10
and GitLab Runner 13.9.
You can use the Docker Machine fork to create Google Compute Engine instances with graphics processing units (GPUs). GitLab Runner 13.9 is required for GPUs to work in a Docker executor.
Docker Machine GPU options
To create an instance with GPUs, use these Docker Machine options:
Option | Example | Description |
---|---|---|
--google-accelerator
| type=nvidia-tesla-p4,count=1
| Specifies the type and number of GPU accelerators to attach to the instance (type=TYPE,count=N format)
|
--google-maintenance-policy
| TERMINATE
| Always use TERMINATE because Google Cloud does not allow live migration of GPU instances.
|
--google-machine-image
| https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110
| The URL of a GPU-enabled operating system. See the list of available images. |
--google-metadata
| install-nvidia-driver=True
| This flag tells the image to install the NVIDIA GPU driver. |
These arguments map to command-line arguments for gcloud compute
.
See the Google documentation on creating VMs with attached GPUs
for more details.
Verifying Docker Machine options
To prepare your system and test that GPUs can be created with Google Compute Engine:
-
Set up the Google Compute Engine driver credentials for Docker Machine. You may need to export environment variables to the runner if your VM does not have a default service account. How this is done depends on how the runner is launched. For example:
- Via
systemd
orupstart
: See the documentation on setting custom environment variables. - Via Kubernetes with the Helm Chart: Update the
values.yaml
entry. - Via Docker: Use the
-e
option (for example,docker run -e GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json gitlab/gitlab-runner
).
- Via
-
Verify that
docker-machine
can create a virtual machine with your desired options. For example, to create ann1-standard-1
machine with a single NVIDIA Tesla P4 accelerator, substitutetest-gpu
with a name and run:docker-machine create --driver google --google-project your-google-project \ --google-disk-size 50 \ --google-machine-type n1-standard-1 \ --google-accelerator type=nvidia-tesla-p4,count=1 \ --google-maintenance-policy TERMINATE \ --google-machine-image https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110 \ --google-metadata "install-nvidia-driver=True" test-gpu
-
To verify the GPU is active, SSH into the machine and run
nvidia-smi
:$ docker-machine ssh test-gpu sudo nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla P4 Off | 00000000:00:04.0 Off | 0 | | N/A 43C P0 22W / 75W | 0MiB / 7611MiB | 3% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
-
Remove this test instance to save money:
docker-machine rm test-gpu
Configuring GitLab Runner
-
Once you have verified these options, configure the Docker executor to use all available GPUs in the
runners.docker
configuration. Then add the Docker Machine options to yourMachineOptions
settings in the GitLab Runnerrunners.machine
configuration. For example:[runners.docker] gpus = "all" [runners.machine] MachineOptions = [ "google-project=your-google-project", "google-disk-size=50", "google-disk-type=pd-ssd", "google-machine-type=n1-standard-1", "google-accelerator=count=1,type=nvidia-tesla-p4", "google-maintenance-policy=TERMINATE", "google-machine-image=https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110", "google-metadata=install-nvidia-driver=True" ]
Troubleshooting
When working with the Docker Machine executor, you might encounter the following issues.
ERROR: Error creating machine
When installing Docker Machine, you might encounter an error that states
ERROR: Error creating machine: Error running provisioning: error installing docker
.
Docker Machine attempts to install Docker on a newly provisioned virtual machine using this script:
if ! type docker; then curl -sSL "https://get.docker.com" | sh -; fi
If the docker
command succeeds, Docker Machine assumes Docker
is installed and continues.
If it does not succeed, Docker Machine attempts to download
and run the script at https://get.docker.com
. If the installation
fails, it’s possible the operating system is no longer supported by
Docker.
To troubleshoot this issue, you can enable debugging on Docker
Machine by setting MACHINE_DEBUG=true
in the environment
where GitLab Runner is installed.
ERROR: Cannot connect to the Docker daemon
The job might fail during the prepare stage with an error message:
Preparing environment
ERROR: Job failed (system failure): prepare environment: Cannot connect to the Docker daemon at tcp://10.200.142.223:2376. Is the docker daemon running? (docker.go:650:120s). Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
This error occurs when the Docker daemon fails to start within the expected time in the VM created
by the Docker Machine executor. To fix this issue, increase the wait_for_services_timeout
value in
the [runners.docker]
section.