This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. As with all projects, the items mentioned on this page are subject to change or delay. The development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
proposed @niskhakova @dmakovey @grzesiek @dorrino @nhxnguyen workinggroup clickhouse 2023-04-04

ClickHouse Self-Managed component costs and maintenance requirements

Summary

ClickHouse requires additional cost and maintenance for self-managed customers:

  • Resource allocation cost: ClickHouse requires a considerable amount of resources to run optimally.
    • Minimum cost estimation shows that setting up ClickHouse can be applicable only for very large Reference Architectures: 25k and up.
  • High availability: ClickHouse SaaS supports HA. No documented HA configuration for self-managed at the moment.
  • Geo setups: Sync and replication complexity for GitLab Geo setups.
  • Upgrades: An additional database to maintain and upgrade along with existing Postgres database. This also includes compatibility issues of mapping GitLab version to ClickHouse version and keeping them up-to-date.
  • Backup and restore: Self-managed customers need to have an engineer who is familiar with backup strategies and disaster recovery process in ClickHouse or switch to ClickHouse SaaS.
  • Monitoring: ClickHouse can use Prometheus, additional component to monitor and troubleshoot.
  • Limitations: Azure object storage is not supported. GitLab does not have the documentation or support expertise to assist customers with deployment and operation of self-managed ClickHouse.
  • ClickHouse SaaS: Customers using a self-managed GitLab instance with regulatory or compliance requirements, or latency concerns likely cannot use ClickHouse SaaS.

Minimum self-managed component costs

Based on ClickHouse spec requirements analysis and collaborating with ClickHouse team, we identified the following minimal configurations for ClickHouse self-managed:

  1. ClickHouse High Availability (HA)
    • ClickHouse - 2 machines with >=16-cores, >=64 GB RAM, SSD, 10 GB Internet. Each machine also runs Keeper.
    • Keeper - 1 machine with 2 CPU, 4 GB of RAM, SSD with high IOPS
  2. ClickHouse non-HA
    • ClickHouse - 1 machine with >=16-cores, >=64 GB RAM, SSD, 10 GB Internet.

The following cost table was compiled using the machine CPU and memory requirements for ClickHouse, and comparing them to the GitLab Reference Architecture sizes and costs from the GCP calculator.

Reference Architecture ClickHouse type ClickHouse cost / (GitLab cost + ClickHouse cost)
1k - non HA non-HA 78.01%
2k - non HA non-HA 44.50%
3k - HA HA 37.87%
5k - HA HA 30.92%
10k - HA HA 20.47%
25k - HA HA 14.30%
50k - HA HA 8.16%
note
The ClickHouse Self-Managed component evaluation is the minimum estimation for the costs with a simplified architecture.

The following components increase the cost, and were not considered in the minimum calculation:

  • Disk size - depends on data size, hard to estimate.
  • Disk types - ClickHouse recommends fast SSDs.
  • Network usage - ClickHouse recommends using 10 GB network, if possible.
  • For HA we sum minimum cost across all reference architectures from 3k to 50k users, but HA specs tend to increase with user count.

Resources