Install and register GitLab Runner for autoscaling with Docker Machine

The auto scale feature was introduced in GitLab Runner 1.1.0.

For an overview of the autoscale architecture, take a look at the comprehensive documentation on autoscaling.

Forked version of Docker machine

Because docker-machine is in maintenance mode, GitLab is providing it’s own fork of docker-machine, which is based on the latest master branch of docker-machine with some additional patches for the following bugs:

The intent of this fork is to fix critical and bugs affecting running costs only. No new features will be added.

Preparing the environment

To use the autoscale feature, Docker and GitLab Runner must be installed in the same machine:

  1. Log in to a new Linux-based machine that will serve as a bastion server where Docker will spawn new machines from
  2. Install GitLab Runner
  3. Install Docker Machine
  4. Optionally but recommended, prepare a proxy container registry and a cache server to be used with the autoscaled runners

If you need to use any virtualization/cloud providers that aren’t handled by Docker Machine’s internal drivers, the appropriate driver plugin must be installed. The Docker Machine driver plugin installation and configuration is out of the scope of this documentation. For more details please read the Docker Machine documentation

Configuring GitLab Runner

  1. Register a runner and select the docker+machine executor when asked.
  2. Edit config.toml and configure the runner to use Docker machine. Visit the dedicated page covering detailed information about GitLab Runner Autoscaling.
  3. The first time you’re using Docker Machine, it’s best to execute manually docker-machine create ... with your chosen driver and all options from the MachineOptions section. This will set up the Docker Machine environment properly and will also be a good validation of the specified options. After this, you can destroy the machine with docker-machine rm [machine_name] and start the runner.

    noteMultiple concurrent requests to docker-machine create that are done at first usage are not good. When the docker+machine executor is used, the runner may spin up few concurrent docker-machine create commands. If Docker Machine was not used before in this environment, each started process tries to prepare SSH keys and SSL certificates (for Docker API authentication between GitLab Runner and Docker Engine on the autoscaled spawned machine), and these concurrent processes are disturbing each other. This can end with a non-working environment. That’s why it’s important to create a test machine manually the very first time you set up GitLab Runner with Docker Machine.
  4. Now, you can try and start a new pipeline in your project. In a few seconds, if you run docker-machine ls you should see a new machine being created.

Upgrading GitLab Runner

  1. Check if your operating system is configured to automatically restart GitLab Runner (for example, by checking its service file):
    • if yes, ensure that service manager is configured to use SIGQUIT and use the service’s tools to stop the process:

      # For systemd
      sudo systemctl stop gitlab-runner
      
      # For upstart
      sudo service gitlab-runner stop
      
    • if no, you may stop the process manually:

      sudo killall -SIGQUIT gitlab-runner
      
    noteSending the SIGQUIT signal will make the process stop gracefully. The process will stop accepting new jobs, and will exit as soon as the current jobs are finished.
  2. Wait until GitLab Runner exits. You can check its status with gitlab-runner status or await a graceful shutdown for up to 30 minutes with:

    for i in `seq 1 180`; do # 1800 seconds = 30 minutes
        gitlab-runner status || break
        sleep 10
    done
    
  3. You can now safely install the new version of GitLab Runner without interrupting any jobs.

Using the forked version of Docker Machine

Install

  1. Download the appropriate docker-machine binary. Copy the binary to a location accessible to PATH and make it executable. For example, to download and install v0.16.2-gitlab.11:

     curl -O "https://gitlab-docker-machine-downloads.s3.amazonaws.com/v0.16.2-gitlab.11/docker-machine-Linux-x86_64"
     cp docker-machine-Linux-x86_64 /usr/local/bin/docker-machine
     chmod +x /usr/local/bin/docker-machine
    

Using GPUs on Google Compute Engine

Introduced in GitLab Docker Machine 0.16.2-gitlab.10 and GitLab Runner 13.9.

You can use the Docker Machine fork to create Google Compute Engine instances with graphics processing units (GPUs). GitLab Runner 13.9 is required for GPUs to work in a Docker executor.

Docker Machine GPU options

To create an instance with GPUs, use these Docker Machine options:

Option Example Description
--google-accelerator type=nvidia-tesla-p4,count=1 Specifies the type and number of GPU accelerators to attach to the instance (type=TYPE,count=N format)
--google-maintenance-policy TERMINATE Always use TERMINATE because Google Cloud does not allow GPU instances to be live migrated.
--google-machine-image https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110 The URL of a GPU-enabled operating system. See the list of available images.
--google-metadata install-nvidia-driver=True This flag tells the image to install the NVIDIA GPU driver.

These arguments map to command-line arguments for gcloud compute. See the Google documentation on creating VMs with attached GPUs for more details.

Verifying Docker Machine options

To prepare your system and test that GPUs can be created with Google Compute Engine:

  1. Set up the Google Compute Engine driver credentials for Docker Machine.

  2. Verify that docker-machine can create a virtual machine with your desired options. For example, to create an n1-standard-1 machine with a single NVIDIA Telsa P4 accelerator, substitute test-gpu with a name and run:

    docker-machine create --driver google --google-project your-google-project \
      --google-disk-size 50 \
      --google-machine-type n1-standard-1 \
      --google-accelerator type=nvidia-tesla-p4,count=1 \
      --google-maintenance-policy TERMINATE \
      --google-machine-image https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110 \
      --google-metadata "install-nvidia-driver=True" test-gpu
    
  3. To verify the GPU is active, SSH into the machine and run nvidia-smi:

    $ docker-machine ssh test-gpu sudo nvidia-smi
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
    | N/A   43C    P0    22W /  75W |      0MiB /  7611MiB |      3%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    
  4. Remove this test instance to save money:

     docker-machine rm test-gpu
    

Configuring GitLab Runner

  1. Once you have verified these options, configure the Docker executor to use all available GPUs in the runners.docker configuration. Then add the Docker Machine options to your MachineOptions settings in the GitLab Runner runners.machine configuration. For example:

    [runners.docker]
      gpus = "all"
    [runners.machine]
      MachineOptions = [
        "google-project=your-google-project",
        "google-disk-size=50",
        "google-disk-type=pd-ssd",
        "google-machine-type=n1-standard-1",
        "google-accelerator=count=1,type=nvidia-tesla-p4",
        "google-maintenance-policy=TERMINATE",
        "google-machine-image=https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110",
        "google-metadata=install-nvidia-driver=True"
    ]