Install and register GitLab Runner for autoscaling with Docker Machine

Tier: Free, Premium, Ultimate Offering: GitLab.com, Self-managed
History
  • The autoscaling feature was introduced in GitLab Runner 1.1.0.
note
The Docker Machine executor was deprecated in GitLab 17.5. If you’re using the Docker Machine executor on Amazon Web Services (AWS) EC2, Microsoft Azure Compute, or Google Compute Engine (GCE), migrate to the GitLab Runner Autoscaler.

For an overview of the autoscale architecture, take a look at the comprehensive documentation on autoscaling.

Forked version of Docker machine

Docker has deprecated Docker Machine. However, GitLab maintains a Docker Machine fork for GitLab Runner users who rely on the Docker Machine executor. This fork is based on the latest main branch of docker-machine with some additional patches for the following bugs:

The intent of the Docker Machine fork is to only fix critical issues and bugs which affect running costs. No new features will be added.

Preparing the environment

To use the autoscale feature, Docker and GitLab Runner must be installed in the same machine:

  1. Log in to a new Linux-based machine that will serve as a bastion server where Docker will spawn new machines from.
  2. Install GitLab Runner.
  3. Install Docker Machine from the Docker Machine fork.
  4. Optionally but recommended, prepare a proxy container registry and a cache server to be used with the autoscaled runners.

Configuring GitLab Runner

  1. Familiarize yourself with the core concepts of using docker-machine with gitlab-runner:
  2. The first time you’re using Docker Machine, it is best to manually execute the docker-machine create ... command with your Docker Machine Driver. Run this command with the options that you intend to configure in the MachineOptions under the [runners.machine] section. This will set up the Docker Machine environment properly and will also be a good validation of the specified options. After this, you can destroy the machine with docker-machine rm [machine_name] and start the runner.

    note
    Multiple concurrent requests to docker-machine create that are done at first usage are not good. When the docker+machine executor is used, the runner may spin up few concurrent docker-machine create commands. If Docker Machine was not used before in this environment, each started process tries to prepare SSH keys and SSL certificates (for Docker API authentication between GitLab Runner and Docker Engine on the autoscaled spawned machine), and these concurrent processes are disturbing each other. This can end with a non-working environment. That’s why it’s important to create a test machine manually the very first time you set up GitLab Runner with Docker Machine.
  3. Register a runner and select the docker+machine executor when asked.
  4. Edit config.toml and configure the runner to use Docker machine. Visit the dedicated page covering detailed information about GitLab Runner Autoscaling.
  5. Now, you can try and start a new pipeline in your project. In a few seconds, if you run docker-machine ls you should see a new machine being created.

Upgrading GitLab Runner

  1. Check if your operating system is configured to automatically restart GitLab Runner (for example, by checking its service file):
    • if yes, ensure that service manager is configured to use SIGQUIT and use the service’s tools to stop the process:

      # For systemd
      sudo systemctl stop gitlab-runner
      
      # For upstart
      sudo service gitlab-runner stop
      
    • if no, you may stop the process manually:

      sudo killall -SIGQUIT gitlab-runner
      
    note
    Sending the SIGQUIT signal makes the process stop gracefully. The process stops accepting new jobs, and exits as soon as the current jobs are finished.
  2. Wait until GitLab Runner exits. You can check its status with gitlab-runner status or await a graceful shutdown for up to 30 minutes with:

    for i in `seq 1 180`; do # 1800 seconds = 30 minutes
        gitlab-runner status || break
        sleep 10
    done
    
  3. You can now safely install the new version of GitLab Runner without interrupting any jobs.

Using the forked version of Docker Machine

Install

  1. Download the appropriate docker-machine binary. Copy the binary to a location accessible to PATH and make it executable. For example, to download and install v0.16.2-gitlab.29:

     curl -O "https://gitlab-docker-machine-downloads.s3.amazonaws.com/v0.16.2-gitlab.29/docker-machine-Linux-x86_64"
     cp docker-machine-Linux-x86_64 /usr/local/bin/docker-machine
     chmod +x /usr/local/bin/docker-machine
    

Using GPUs on Google Compute Engine

History
  • Introduced in GitLab Docker Machine 0.16.2-gitlab.10 and GitLab Runner 13.9.
note
GPUs are supported on every executor. It is not necessary to use Docker Machine just for GPU support. The Docker Machine executor makes it easy to scale the GPU nodes up and down, but this can also be done with the Kubernetes executor.

You can use the Docker Machine fork to create Google Compute Engine instances with graphics processing units (GPUs). GitLab Runner 13.9 is required for GPUs to work in a Docker executor.

Docker Machine GPU options

To create an instance with GPUs, use these Docker Machine options:

Option Example Description
--google-accelerator type=nvidia-tesla-p4,count=1 Specifies the type and number of GPU accelerators to attach to the instance (type=TYPE,count=N format)
--google-maintenance-policy TERMINATE Always use TERMINATE because Google Cloud does not allow live migration of GPU instances.
--google-machine-image https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110 The URL of a GPU-enabled operating system. See the list of available images.
--google-metadata install-nvidia-driver=True This flag tells the image to install the NVIDIA GPU driver.

These arguments map to command-line arguments for gcloud compute. See the Google documentation on creating VMs with attached GPUs for more details.

Verifying Docker Machine options

To prepare your system and test that GPUs can be created with Google Compute Engine:

  1. Set up the Google Compute Engine driver credentials for Docker Machine. You may need to export environment variables to the runner if your VM does not have a default service account. How this is done depends on how the runner is launched. For example:

  2. Verify that docker-machine can create a virtual machine with your desired options. For example, to create an n1-standard-1 machine with a single NVIDIA Tesla P4 accelerator, substitute test-gpu with a name and run:

    docker-machine create --driver google --google-project your-google-project \
      --google-disk-size 50 \
      --google-machine-type n1-standard-1 \
      --google-accelerator type=nvidia-tesla-p4,count=1 \
      --google-maintenance-policy TERMINATE \
      --google-machine-image https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110 \
      --google-metadata "install-nvidia-driver=True" test-gpu
    
  3. To verify the GPU is active, SSH into the machine and run nvidia-smi:

    $ docker-machine ssh test-gpu sudo nvidia-smi
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
    | N/A   43C    P0    22W /  75W |      0MiB /  7611MiB |      3%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    
  4. Remove this test instance to save money:

    docker-machine rm test-gpu
    

Configuring GitLab Runner

  1. Once you have verified these options, configure the Docker executor to use all available GPUs in the runners.docker configuration. Then add the Docker Machine options to your MachineOptions settings in the GitLab Runner runners.machine configuration. For example:

    [runners.docker]
      gpus = "all"
    [runners.machine]
      MachineOptions = [
        "google-project=your-google-project",
        "google-disk-size=50",
        "google-disk-type=pd-ssd",
        "google-machine-type=n1-standard-1",
        "google-accelerator=count=1,type=nvidia-tesla-p4",
        "google-maintenance-policy=TERMINATE",
        "google-machine-image=https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110",
        "google-metadata=install-nvidia-driver=True"
      ]
    

Troubleshooting

When working with the Docker Machine executor, you might encounter the following issues.

ERROR: Error creating machine

When installing Docker Machine, you might encounter an error that states ERROR: Error creating machine: Error running provisioning: error installing docker.

Docker Machine attempts to install Docker on a newly provisioned virtual machine using this script:

if ! type docker; then curl -sSL "https://get.docker.com" | sh -; fi

If the docker command succeeds, Docker Machine assumes Docker is installed and continues.

If it does not succeed, Docker Machine attempts to download and run the script at https://get.docker.com. If the installation fails, it’s possible the operating system is no longer supported by Docker.

To troubleshoot this issue, you can enable debugging on Docker Machine by setting MACHINE_DEBUG=true in the environment where GitLab Runner is installed.

ERROR: Cannot connect to the Docker daemon

The job might fail during the prepare stage with an error message:

Preparing environment 
ERROR: Job failed (system failure): prepare environment: Cannot connect to the Docker daemon at tcp://10.200.142.223:2376. Is the docker daemon running? (docker.go:650:120s). Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information

This error occurs when the Docker daemon fails to start within the expected time in the VM created by the Docker Machine executor. To fix this issue, increase the wait_for_services_timeout value in the [runners.docker] section.