ClickHouse integration guidelines

  • Tier: Free, Premium, Ultimate
  • Offering: GitLab.com, GitLab Self-Managed, GitLab Dedicated
  • Status: Beta on GitLab Self-Managed and GitLab Dedicated

For more information on plans for ClickHouse support for GitLab Self-Managed, see this epic.

For more information about ClickHouse support for GitLab Dedicated, see ClickHouse for GitLab Dedicated.

ClickHouse is an open-source column-oriented database management system. It can efficiently filter, aggregate, and query across large data sets.

ClickHouse is a secondary data store for GitLab. Only specific data is stored in ClickHouse for advanced analytical features such as GitLab Duo and SDLC trends and CI Analytics.

You can connect ClickHouse to GitLab either:

Supported ClickHouse versions

First GitLab versionClickHouse versionsComment
17.7.023.x (24.x, 25.x)For using ClickHouse 24.x and 25.x see the workaround section.
18.1.023.x, 24.x, 25.x
18.5.023.x, 24.x, 25.xExperimental support for Replicated database engine.

ClickHouse Cloud is supported. Compatibility is generally ensured with the latest major GitLab release and newer versions.

Set up ClickHouse

To set up ClickHouse with GitLab:

  1. Run ClickHouse Cluster and configure database.
  2. Configure GitLab connection to ClickHouse.
  3. Run ClickHouse migrations.

Run and configure ClickHouse

When you run ClickHouse on a hosted server, various data points might impact the resource consumption, like the number of builds that run on your instance each month, the selected hardware, the data center choice to host ClickHouse, and more. Regardless, the cost should not be significant.

To create the necessary user and database objects:

  1. Generate a secure password and save it.

  2. Sign in to the ClickHouse SQL console.

  3. Execute the following command. Replace PASSWORD_HERE with the generated password.

    CREATE DATABASE gitlab_clickhouse_main_production;
    CREATE USER gitlab IDENTIFIED WITH sha256_password BY 'PASSWORD_HERE';
    CREATE ROLE gitlab_app;
    GRANT SELECT, INSERT, ALTER, CREATE, UPDATE, DROP, TRUNCATE, OPTIMIZE ON gitlab_clickhouse_main_production.* TO gitlab_app;
    GRANT SELECT ON information_schema.* TO gitlab_app;
    GRANT gitlab_app TO gitlab;

Configure the GitLab connection to ClickHouse

To provide GitLab with ClickHouse credentials:

  1. Edit /etc/gitlab/gitlab.rb:

    gitlab_rails['clickhouse_databases']['main']['database'] = 'gitlab_clickhouse_main_production'
    gitlab_rails['clickhouse_databases']['main']['url'] = 'https://example.com/path'
    gitlab_rails['clickhouse_databases']['main']['username'] = 'gitlab'
    gitlab_rails['clickhouse_databases']['main']['password'] = 'PASSWORD_HERE' # replace with the actual password
  2. Save the file and reconfigure GitLab:

    sudo gitlab-ctl reconfigure
  1. Save the ClickHouse password as a Kubernetes Secret:

    kubectl create secret generic gitlab-clickhouse-password --from-literal="main_password=PASSWORD_HERE"
  2. Export the Helm values:

    helm get values gitlab > gitlab_values.yaml
  3. Edit gitlab_values.yaml:

    global:
      clickhouse:
        enabled: true
        main:
          username: default
          password:
            secret: gitlab-clickhouse-password
            key: main_password
          database: gitlab_clickhouse_main_production
          url: 'http://example.com'
  4. Save the file and apply the new values:

    helm upgrade -f gitlab_values.yaml gitlab gitlab/gitlab

To verify that your connection is set up successfully:

  1. Sign in to Rails console

  2. Execute the following command:

    ClickHouse::Client.select('SELECT 1', :main)

    If successful, the command returns [{"1"=>1}]

Run ClickHouse migrations

To create the required database objects execute:

sudo gitlab-rake gitlab:clickhouse:migrate

Enable ClickHouse for Analytics

Now that your GitLab instance is connected to ClickHouse, you can enable features to use ClickHouse by enabling ClickHouse for Analytics.

Replicated database engine

For a multi-node, high-availability setup, GitLab supports the Replicated table engine in ClickHouse.

Prerequisites:

When configuring the database, you must run the statements with the ON CLUSTER clause. In the following example, replace CLUSTER_NAME_HERE with your cluster’s name:

CREATE DATABASE gitlab_clickhouse_main_production ON CLUSTER CLUSTER_NAME_HERE ENGINE = Replicated('/clickhouse/databases/{cluster}/gitlab_clickhouse_main_production', '{shard}', '{replica}')
CREATE USER gitlab IDENTIFIED WITH sha256_password BY 'PASSWORD_HERE' ON CLUSTER CLUSTER_NAME_HERE;
CREATE ROLE gitlab_app ON CLUSTER CLUSTER_NAME_HERE;
GRANT SELECT, INSERT, ALTER, CREATE, UPDATE, DROP, TRUNCATE, OPTIMIZE ON gitlab_clickhouse_main_production.* TO gitlab_app ON CLUSTER CLUSTER_NAME_HERE;
GRANT SELECT ON information_schema.* TO gitlab_app ON CLUSTER CLUSTER_NAME_HERE;
GRANT gitlab_app TO gitlab ON CLUSTER CLUSTER_NAME_HERE;

Load balancer considerations

The GitLab application communicates with the ClickHouse cluster through the HTTP/HTTPS interface. Consider using an HTTP proxy for load balancing requests to the ClickHouse cluster, such as chproxy.

Troubleshooting

Database schema migrations on GitLab 18.0.0 and earlier

On GitLab 18.0.0 and earlier, running database schema migrations for ClickHouse may fail for ClickHouse 24.x and 25.x with the following error message:

Code: 344. DB::Exception: Projection is fully supported in ReplacingMergeTree with deduplicate_merge_projection_mode = throw. Use 'drop' or 'rebuild' option of deduplicate_merge_projection_mode

Without running all migrations, the ClickHouse integration will not work.

To work around this issue and run the migrations:

  1. Sign in to Rails console

  2. Execute the following command:

    ClickHouse::Client.execute("INSERT INTO schema_migrations (version) VALUES ('20231114142100'), ('20240115162101')", :main)
  3. Migrate the database again:

    sudo gitlab-rake gitlab:clickhouse:migrate

This time the database migration should successfully finish.