ClickHouse
- Tier: Free, Premium, Ultimate
- Offering: GitLab.com, GitLab Self-Managed, GitLab Dedicated
- Status: Beta on GitLab Self-Managed and GitLab Dedicated
ClickHouse is an open-source column-oriented database management system. It can efficiently filter, aggregate, and query across large data sets.
GitLab uses Clickhouse as a secondary data store to enable advanced analytics features such as GitLab Duo, SDLC trends, and CI Analytics. GitLab only stores data that supports these features in Clickhouse.
You should use ClickHouse Cloud to connect ClickHouse to GitLab.
Alternatively, you can bring your own ClickHouse. For more information, see ClickHouse recommendations for GitLab Self-Managed.
Analytics available with ClickHouse
After you configure ClickHouse, you can use the following analytics features:
| Feature | Description |
|---|---|
| Runner fleet dashboard | Displays runner usage metrics and job wait times. Provides export of CSV files containing job counts and executed runner minutes by runner type and job status for each project. |
| Contribution analytics | Provides analytics of group member contributions (push events, issues, merge requests) over time. ClickHouse reduces the likelihood of timeout issues for large instances. |
| GitLab Duo and SDLC trends | Measures the impact of GitLab Duo on software development performance. Tracks development metrics (deployment frequency, lead time, change failure rate, time to restore) alongside AI-specific indicators (GitLab Duo seat adoption, Code Suggestions acceptance rates, and GitLab Duo Chat usage). |
| GraphQL API for AI Metrics | Provides programmatic access to GitLab Duo and SDLC trend data through the AiMetrics, AiUserMetrics, and AiUsageData endpoints. Provides export of pre-aggregated metrics and raw event data for integration with BI tools and custom analytics. |
Supported ClickHouse versions
The supported ClickHouse version differs depending on your GitLab version:
- GitLab 17.7 and later supports ClickHouse 23.x. To use either ClickHouse 24.x or 25.x, use the workaround.
- GitLab 18.1 and later supports ClickHouse 23.x, 24.x, and 25.x.
- GitLab 18.8 and later supports ClickHouse 23.x, 24.x, 25.x, and the Replicated database engine.
ClickHouse Cloud is always compatible with the latest stable GitLab release.
Set up ClickHouse
Choose your deployment type based on your operational requirements:
- ClickHouse Cloud (Recommended): Fully managed service with automatic upgrades, backups, and scaling.
- ClickHouse for GitLab Self-Managed (BYOC): Complete control over your infrastructure and configuration.
After setting up your ClickHouse instance:
- Create the GitLab database and user.
- Configure the GitLab connection.
- Verify the connection.
- Run ClickHouse migrations.
- Enable ClickHouse for Analytics.
Set up ClickHouse Cloud
Prerequisites:
- Have a ClickHouse Cloud account.
- Enable network connectivity from your GitLab instance to ClickHouse Cloud.
- Be an administrator your GitLab instance.
To set up ClickHouse Cloud:
- Sign in to ClickHouse Cloud.
- Select New Service.
- Choose your service tier:
- Development: For testing and development environments.
- Production: For production workloads with high availability.
- Select your cloud provider and region. Choose a region close to your GitLab instance for optimal performance.
- Configure your service name and settings.
- Select Create Service.
- Once provisioned, note your connection details from the service dashboard:
- Host
- Port (usually
9440for secure connections) - Username
- Password
ClickHouse Cloud automatically handles version upgrades and security patches. Enterprise Edition (EE) customers can schedule upgrades to control when they occur, and avoid unexpected service interruptions during business hours. For more information, see upgrade ClickHouse.
After you create your ClickHouse Cloud service, you then create the GitLab database and user.
Set up ClickHouse for GitLab Self-Managed (BYOC)
Prerequisites:
- Have a ClickHouse instance installed and running. If ClickHouse is not installed, see:
- Have a supported ClickHouse version.
- Enable network connectivity from your GitLab instance to ClickHouse.
- Be an Administrator for both ClickHouse and your GitLab instance.
For ClickHouse for GitLab Self-Managed, you are responsible for planning and executing version upgrades, security patches, and backups. For more information, see Upgrade ClickHouse.
Configure High Availability
For a multi-node, high-availability (HA) setup, GitLab supports the Replicated table engine in ClickHouse.
Prerequisites:
- Have a ClickHouse cluster with multiple nodes. A minimum of three nodes is recommended.
- Define a cluster in the
remote_serversconfiguration section. - Configure the following macros in your ClickHouse configuration:
clustershardreplica
When configuring the database for HA, you must run the statements with the ON CLUSTER clause.
For more information, see ClickHouse Replicated database engine documentation.
Configure Load balancer
The GitLab application communicates with the ClickHouse cluster through the HTTP/HTTPS interface. For HA deployments, use an HTTP proxy or load balancer to distribute requests across ClickHouse cluster nodes.
Recommended load balancer options:
- chproxy - ClickHouse-specific HTTP proxy with built-in caching and routing.
- HAProxy - General-purpose TCP/HTTP load balancer.
- NGINX - Web server with load balancing capabilities.
- Cloud provider load balancers (AWS Application Load Balancer, GCP Load Balancer, Azure Load Balancer).
Basic chproxy configuration example:
server:
http:
listen_addr: ":8080"
clusters:
- name: "clickhouse_cluster"
nodes: [
"http://ch-node1:8123",
"http://ch-node2:8123",
"http://ch-node3:8123"
]
users:
- name: "gitlab"
password: "your_secure_password"
to_cluster: "clickhouse_cluster"
to_user: "gitlab"When using a load balancer, configure GitLab to connect to the load balancer URL instead of individual ClickHouse nodes.
For more information, see chproxy documentation.
After you configure your ClickHouse for GitLab Self-Managed instance, create the GitLab database and user.
Verify ClickHouse installation
Before configuring the database, verify ClickHouse is installed and accessible:
Check ClickHouse is running:
clickhouse-client --query "SELECT version()"If ClickHouse is running, you see the version number (for example,
24.3.1.12).Verify you can connect with credentials:
clickhouse-client --host your-clickhouse-host --port 9440 --secure --user default --password 'your-password'If you have not configured TLS yet, use port
9000without the--secureflag for initial testing.
Create database and user
To create the necessary user and database objects:
- Generate a secure password and save it.
- Sign in to:
- For ClickHouse Cloud, the ClickHouse SQL console.
- For ClickHouse for GitLab Self-Managed, the
clickhouse-client.
- Run the following commands, replacing
PASSWORD_HEREwith the generated password.
CREATE DATABASE gitlab_clickhouse_main_production;
CREATE USER gitlab IDENTIFIED WITH sha256_password BY 'PASSWORD_HERE';
CREATE ROLE gitlab_app;
GRANT SELECT, INSERT, ALTER, CREATE, UPDATE, DROP, TRUNCATE, OPTIMIZE ON gitlab_clickhouse_main_production.* TO gitlab_app;
GRANT SELECT ON information_schema.* TO gitlab_app;
GRANT gitlab_app TO gitlab;Replace CLUSTER_NAME_HERE with your cluster’s name:
CREATE DATABASE gitlab_clickhouse_main_production ON CLUSTER CLUSTER_NAME_HERE ENGINE = Replicated('/clickhouse/databases/{cluster}/gitlab_clickhouse_main_production', '{shard}', '{replica}');
CREATE USER gitlab IDENTIFIED WITH sha256_password BY 'PASSWORD_HERE' ON CLUSTER CLUSTER_NAME_HERE;
CREATE ROLE gitlab_app ON CLUSTER CLUSTER_NAME_HERE;
GRANT SELECT, INSERT, ALTER, CREATE, UPDATE, DROP, TRUNCATE, OPTIMIZE ON gitlab_clickhouse_main_production.* TO gitlab_app ON CLUSTER CLUSTER_NAME_HERE;
GRANT SELECT ON information_schema.* TO gitlab_app ON CLUSTER CLUSTER_NAME_HERE;
GRANT gitlab_app TO gitlab ON CLUSTER CLUSTER_NAME_HERE;Configure the GitLab connection
To provide GitLab with ClickHouse credentials:
Edit
/etc/gitlab/gitlab.rb:gitlab_rails['clickhouse_databases']['main']['database'] = 'gitlab_clickhouse_main_production' gitlab_rails['clickhouse_databases']['main']['url'] = 'https://your-clickhouse-host:port' gitlab_rails['clickhouse_databases']['main']['username'] = 'gitlab' gitlab_rails['clickhouse_databases']['main']['password'] = 'PASSWORD_HERE' # replace with the actual passwordReplace the URL with:
- For ClickHouse Cloud:
https://your-service.clickhouse.cloud:9440 - ClickHouse for GitLab Self-Managed:
https://your-clickhouse-host:8443 - For ClickHouse for GitLab Self-Managed HA with load balancer:
https://your-load-balancer:8080(or your load balancer URL)
- For ClickHouse Cloud:
Save the file and reconfigure GitLab:
sudo gitlab-ctl reconfigure
Save the ClickHouse password as a Kubernetes Secret:
kubectl create secret generic gitlab-clickhouse-password --from-literal="main_password=PASSWORD_HERE"Export the Helm values:
helm get values gitlab > gitlab_values.yamlEdit
gitlab_values.yaml:global: clickhouse: enabled: true main: username: gitlab password: secret: gitlab-clickhouse-password key: main_password database: gitlab_clickhouse_main_production url: 'https://your-clickhouse-host:port'Replace the URL with:
- For ClickHouse Cloud:
https://your-service.clickhouse.cloud:9440 - For ClickHouse for GitLab Self-Managed single node:
https://your-clickhouse-host:8443 - For ClickHouse for GitLab Self-Managed HA with load balancer:
https://your-load-balancer:8080(or your load balancer URL)
- For ClickHouse Cloud:
Save the file and apply the new values:
helm upgrade -f gitlab_values.yaml gitlab gitlab/gitlab
For production deployments, configure TLS/SSL on your ClickHouse instance and use https:// URLs. For GitLab Self-Managed installations, see the Network Security documentation.
Verify the connection
To verify that your connection is set up successfully:
Sign in to the Rails console.
Execute the following command:
ClickHouse::Client.select('SELECT 1', :main)If successful, the command returns
[{"1"=>1}].
If the connection fails, verify:
- ClickHouse service is running and accessible.
- Network connectivity from GitLab to ClickHouse. Check that firewalls and security groups allow connections.
- Connection URL is correct (host, port, protocol).
- Credentials are correct.
- For HA cluster deployments: Load balancer is properly configured and routing requests.
Run ClickHouse migrations
To create the required database objects, execute:
sudo gitlab-rake gitlab:clickhouse:migrateMigrations are executed automatically using the GitLab-Migrations chart.
Alternatively, you can run migrations by executing the following command in the Toolbox pod:
gitlab-rake gitlab:clickhouse:migrateEnable ClickHouse for Analytics
After your GitLab instance is connected to ClickHouse, you can enable features that use ClickHouse:
Prerequisites:
- You must have administrator access to the instance.
- ClickHouse connection is configured and verified.
- Migrations have been successfully completed.
To enable ClickHouse for Analytics:
- On the left sidebar, at the bottom, select Admin.
- Select Settings > General.
- Expand ClickHouse.
- Select Enable ClickHouse for Analytics.
- Select Save changes.
Disable ClickHouse for Analytics
To disable ClickHouse for Analytics:
Prerequisites:
- You must have administrator access to the instance.
To disable:
- On the left sidebar, at the bottom, select Admin.
- Select Settings > General.
- Expand ClickHouse.
- Clear the Enable ClickHouse for Analytics checkbox.
- Select Save changes.
Disabling ClickHouse for Analytics stops GitLab from querying ClickHouse but does not delete any data from your ClickHouse instance. Analytics features that rely on ClickHouse will fall back to alternative data sources or become unavailable.
Upgrade ClickHouse
ClickHouse Cloud
ClickHouse Cloud automatically handles version upgrades and security patches. No manual intervention is required.
For information about upgrade scheduling and maintenance windows, see the ClickHouse Cloud documentation.
ClickHouse Cloud notifies you in advance of upcoming upgrades. Review the ClickHouse Cloud changelog to stay informed about new features and changes.
ClickHouse for GitLab Self-Managed (BYOC)
For ClickHouse for GitLab Self-Managed, you are responsible for planning and executing version upgrades.
Prerequisites:
- Have administrator access to the ClickHouse instance.
- Back up your data before upgrading. See Disaster recovery.
Before upgrading:
- Review the ClickHouse release notes for breaking changes.
- Check compatibility with your GitLab version.
- Test the upgrade in a non-production environment.
- Plan for potential downtime, or use a rolling upgrade strategy for HA clusters.
To upgrade ClickHouse:
- For single-node deployments, follow the ClickHouse upgrade documentation.
- For HA cluster deployments, perform a rolling upgrade to minimize downtime:
- Upgrade one node at a time.
- Wait for the node to rejoin the cluster.
- Verify cluster health before proceeding to the next node.
Always ensure the ClickHouse version remains compatible with your GitLab version. Incompatible versions might cause indexing to pause and features to fail. For more information, see supported ClickHouse versions
For detailed upgrade procedures, see the ClickHouse documentation on updates.
Operations
Check migration status
Prerequisites:
- You must have administrator access to the instance.
To check the status of ClickHouse migrations:
- On the left sidebar, at the bottom, select Admin.
- Select Settings > General.
- Expand ClickHouse.
- Review the Migration status section if available.
Alternatively, check for pending migrations using the Rails console:
# Sign in to Rails console
# Run this to check migrations
ClickHouse::MigrationSupport::Migrator.new(:main).pending_migrationsRetry failed migrations
If a ClickHouse migration fails:
Check the logs for error details. ClickHouse-related errors are logged in the GitLab application logs.
Address the underlying issue (for example, insufficient memory, connectivity problems).
Retry the migration:
# For installations that use the Linux package sudo gitlab-rake gitlab:clickhouse:migrate # For self-compiled installations bundle exec rake gitlab:clickhouse:migrate RAILS_ENV=production
Migrations are designed to be idempotent and safe to retry. If a migration fails partway through, running it again resumes from where it left off or skip already-completed steps.
ClickHouse Rake tasks
GitLab provides several Rake tasks for managing your ClickHouse database.
The following Rake tasks are available:
| Task | Description |
|---|---|
sudo gitlab-rake gitlab:clickhouse:migrate | Runs all pending ClickHouse migrations to create or update database schema. |
sudo gitlab-rake gitlab:clickhouse:drop | Drops all ClickHouse databases. Use with extreme caution as this deletes all data. |
sudo gitlab-rake gitlab:clickhouse:create | Creates ClickHouse databases if they do not exist. |
sudo gitlab-rake gitlab:clickhouse:setup | Creates databases and runs all migrations. Equivalent to running create and migrate tasks. |
sudo gitlab-rake gitlab:clickhouse:schema:dump | Dumps the current database schema to a file for backup or version control. |
sudo gitlab-rake gitlab:clickhouse:schema:load | Loads the database schema from a dump file. |
For self-compiled installations, use bundle exec rake instead of sudo gitlab-rake and add RAILS_ENV=production to the end of the command.
Common task examples
Verify ClickHouse connection and schema
To verify your ClickHouse connection is working:
# For installations that use the Linux package
sudo gitlab-rake gitlab:clickhouse:info
# For self-compiled installations
bundle exec rake gitlab:clickhouse:info RAILS_ENV=productionThis task outputs debugging information about the ClickHouse connection and configuration.
Re-run all migrations
To run all pending migrations:
# For installations that use the Linux package
sudo gitlab-rake gitlab:clickhouse:migrate
# For self-compiled installations
bundle exec rake gitlab:clickhouse:migrate RAILS_ENV=productionReset the database
This deletes all data in your ClickHouse database. Use only in development or when troubleshooting.
To drop and recreate the database:
# For installations that use the Linux package
sudo gitlab-rake gitlab:clickhouse:drop
sudo gitlab-rake gitlab:clickhouse:setup
# For self-compiled installations
bundle exec rake gitlab:clickhouse:drop RAILS_ENV=production
bundle exec rake gitlab:clickhouse:setup RAILS_ENV=productionEnvironment variables
You can use environment variables to control Rake task behavior:
| Environment Variable | Data Type | Description |
|---|---|---|
VERBOSE | Boolean | Set to true to see detailed output during migrations. Example: VERBOSE=true sudo gitlab-rake gitlab:clickhouse:migrate |
Performance tuning
For resource sizing and deployment recommendations based on your user count, see system requirements.
For information about ClickHouse architecture and performance tuning, see the ClickHouse documentation on architecture.
Disaster recovery
Backup and Restore
You should perform a full backup before upgrading the GitLab application. ClickHouse data is not included in GitLab backup tooling.
Backup and restore strategy depends on the choice of deployment.
ClickHouse Cloud
ClickHouse Cloud automatically:
- Manages the backups and restores.
- Create and retains daily backups.
You do not have to do any additional configuration.
For more information, see ClickHouse Cloud backups.
ClickHouse for GitLab Self-Managed
If you manage your own ClickHouse instance, you should take regular backups to ensure data safety:
- Take initial full backups of tables (excluding system tables like
metricsorlogs) to a object storage bucket, for example AWS S3. - Take incremental backups after this initial full backup.
This duplicates data for every full backup, but is the easiest approach to restore data.
Alternatively, use clickhouse-backup. This is a third-party tool that provides similar functionality with additional features like scheduling and remote storage management.
Monitoring
To ensure the stability of the GitLab integration, you should monitor the health and performance of your ClickHouse cluster.
ClickHouse Cloud
ClickHouse Cloud provides a native Prometheus integration that exposes metrics through a secure API endpoint.
After generating the API credentials, you can configure collectors to scrape metrics from ClickHouse Cloud. For example, a Prometheus deployment.
ClickHouse for GitLab Self-Managed
ClickHouse can expose metrics in Prometheus format. To enable this:
Configure the
prometheussection in yourconfig.xmlto expose metrics on a dedicated port (default is9363).<prometheus> <endpoint>/metrics</endpoint> <port>9363</port> <metrics>true</metrics> <events>true</events> <asynchronous_metrics>true</asynchronous_metrics> </prometheus>Configure Prometheus or a similar compatible server to scrape
http://<clickhouse-host>:9363/metrics.
Metrics to monitor
You should set up alerts for the following metrics to detect issues that may impact GitLab features:
| Metric Name | Description | Alert Threshold (Recommendation) |
|---|---|---|
ClickHouse_Metrics_Query | Number of queries currently executing. A sudden spike might indicate a performance bottleneck. | Baseline deviation (for example > 100) |
ClickHouseProfileEvents_FailedSelectQuery | Number of failed select queries | Baseline deviation (for example > 50) |
ClickHouseProfileEvents_FailedInsertQuery | Number of failed insert queries | Baseline deviation (for example > 10) |
ClickHouse_AsyncMetrics_ReadonlyReplica | Indicates if a replica has gone into read-only mode (often due to ZooKeeper connection loss). | > 0 (take immediate action) |
ClickHouse_ProfileEvents_NetworkErrors | Network errors (connection resets/timeouts). Frequent errors might cause GitLab background jobs to fail. | Rate > 0 |
Liveness check
If ClickHouse is available behind a load balancer, you can use the HTTP /ping endpoint to check for liveness.
The expected response is Ok with HTTP Code 200.
Security and auditing
To ensure the security of your data and ensure audit ability, use the following security practices.
Network security
TLS Encryption: Configure ClickHouse servers to use TLS encryption to validate connections.
When configuring the connection URL in GitLab, you should use the
https://protocol (for example,https://clickhouse.example.com:8443) to specify this.IP Allow lists: Restrict access to the ClickHouse port (default
8443or9440) to only the GitLab application nodes and other authorized networks.
Audit logging
GitLab application does not maintain a separate audit log for individual ClickHouse queries. In order to satisfy specific requirements regarding data access (who queried what and when), you can enable logging on the ClickHouse side.
ClickHouse Cloud
In ClickHouse Cloud, query logging is enabled by default.
You can access these logs by querying the system.query_log table.
ClickHouse for GitLab Self-Managed
For self-managed instances, ensure the query_log configuration parameter is enabled in your server configuration:
Verify that the
query_logsection exists in yourconfig.xmlorusers.xml:<query_log> <database>system</database> <table>query_log</table> <partition_by>toYYYYMM(event_date)</partition_by> <flush_interval_milliseconds>7500</flush_interval_milliseconds> <ttl>event_date + INTERVAL 30 DAY</ttl> <!-- Keep only 30 days --> </query_log>Once enabled, all executed queries are recorded in the
system.query_logtable, allowing for audit trail.
System requirements
The recommended system requirements change depending on the number of users.
Deployment decision matrix quick reference
| Users | Primary Recommendation | Comparable AWS ARM Instance | Comparable GCP ARM Instance | Deployment Type |
|---|---|---|---|---|
| 1K | ClickHouse Cloud Basic | - | - | Managed |
| 2K | ClickHouse Cloud Basic | m8g.xlarge | c4a-standard-4 | Managed or Single Node |
| 3K | ClickHouse Cloud Scale | m8g.2xlarge | c4a-standard-8 | Managed or Single Node |
| 5K | ClickHouse Cloud Scale | m8g.4xlarge | c4a-standard-16 | Managed or Single Node |
| 10K | ClickHouse Cloud Scale | m8g.4xlarge | c4a-standard-16 | Managed or Single Node/HA |
| 25K | ClickHouse for GitLab Self-Managed or ClickHouse Cloud Scale | m8g.8xlarge or 3×m8g.4xlarge | c4a-standard-32 or 3×c4a-standard-16 | Managed or Single Node/HA |
| 50K | ClickHouse for GitLab Self-Managed high availability (HA) or ClickHouse Cloud Scale | 3×m8g.4xlarge | 3×c4a-standard-16 | Managed or HA Cluster |
1K Users
Recommendation: ClickHouse Cloud Basic as it provides good cost efficiency with no operational complexity.
2K Users
Recommendation: ClickHouse Cloud Basic as it offers best value with no operational complexity.
Alternative recommendation for ClickHouse for GitLab Self-Managed deployment:
- AWS: m8g.xlarge (4 vCPU, 16 GB)
- GCP: c4a-standard-4 or n4-standard-4 (4 vCPU, 16 GB)
- Storage: 20 GB with low-medium performance tier
3K Users
Recommendation: ClickHouse Cloud Scale
Alternative recommendation for ClickHouse for GitLab Self-Managed deployment:
- AWS: m8g.2xlarge (8 vCPU, 32 GB)
- GCP: c4a-standard-8 or n4-standard-8 (8 vCPU, 32 GB)
- Storage: 100 GB with medium performance tier
Note: HA deployments not cost-effective at this scale.
5K Users
Recommendation: ClickHouse Cloud Scale
Alternative recommendation for ClickHouse for GitLab Self-Managed deployment:
- AWS: m8g.4xlarge (16 vCPU, 64 GB)
- GCP: c4a-standard-16 or n4-standard-16 (16 vCPU, 64 GB)
- Storage: 100 GB with high performance tier
- Deployment: Single node recommended
10K Users
Recommendation: ClickHouse Cloud Scale
Alternative recommendation for ClickHouse for GitLab Self-Managed deployment:
- AWS: m8g.4xlarge (16 vCPU, 64 GB)
- GCP: c4a-standard-16 or n4-standard-16 (16 vCPU, 64 GB)
- Storage: 200 GB with high performance tier
- HA Option: 3-node cluster becomes viable for critical workloads
25K Users
Recommendation: ClickHouse Cloud Scale or ClickHouse for GitLab Self-Managed. Both options are economically feasible at this scale.
Recommendations for ClickHouse for GitLab Self-Managed deployment:
Single Node:
- AWS: m8g.8xlarge (32 vCPU, 128 GB)
- GCP: c4a-standard-32 or n4-standard-32 (32 vCPU, 128 GB)
HA Deployment:
- AWS: 3 × m8g.4xlarge (16 vCPU, 64 GB each)
- GCP: 3 × c4a-standard-16 or 3 × n4-standard-16 (16 vCPU, 64 GB each)
Storage: 400 GB per node with high performance tier.
50K Users
Recommendation: ClickHouse for GitLab Self-Managed HA or ClickHouse Cloud Scale. The self-managed option is slightly more cost-effective at this scale.
Recommendations for ClickHouse for GitLab Self-Managed deployment:
Single Node:
- AWS: m8g.8xlarge (32 vCPU, 128 GB)
- GCP: c4a-standard-32 or n4-standard-32 (32 vCPU, 128 GB)
HA Deployment (Preferred):
- AWS: 3 × m8g.4xlarge (16 vCPU, 64 GB each)
- GCP: 3 × c4a-standard-16 or 3 × n4-standard-16 (16 vCPU, 64 GB each)
Storage: 1000 GB per node with high performance tier.
HA considerations for ClickHouse for GitLab Self-Managed deployment
HA setup becomes cost effective only at 10k users or above.
- Minimum: Three ClickHouse nodes for quorum.
- ClickHouse Keeper: Three nodes for coordination (can be co-located or separate).
- LoadBalancer: Recommended for distributing queries.
- Network: Low-latency connectivity between nodes is critical.
Glossary
- Cluster: A collection of nodes (servers) that work together to store and process data.
- MergeTree:
MergeTreeis a table engine in ClickHouse designed for high data ingest rates and large data volumes. It is the core storage engine in ClickHouse, providing features such as columnar storage, custom partitioning, sparse primary indexes, and support for background data merges. - Parts: A physical file on a disk that stores a portion of the table’s data. A part is different from a partition, which is a logical division of a table’s data that is created using a partition key.
- Replica: A copy of the data stored in a ClickHouse database. You can have any number of replicas of the same data for redundancy and reliability. Replicas are used in conjunction with the ReplicatedMergeTree table engine, which enables ClickHouse to keep multiple copies of data in sync across different servers.
- Shard: A subset of data. ClickHouse always has at least one shard for your data. If you do not split the data across multiple servers, your data is stored in one shard. Sharding data across multiple servers can be used to divide the load if you exceed the capacity of a single server.
- TTL (Time To Live): Time To Live (TTL) is a ClickHouse feature that automatically moves, deletes, or rolls up columns/rows after a certain time period. This allows you to manage storage more efficiently because you can delete, move, or archive the data that you no longer need to access frequently.
Troubleshooting
Database schema migrations on GitLab 18.0.0 and earlier
On GitLab 18.0.0 and earlier, running database schema migrations for ClickHouse may fail for ClickHouse 24.x and 25.x with the following error message:
Code: 344. DB::Exception: Projection is fully supported in ReplacingMergeTree with deduplicate_merge_projection_mode = throw. Use 'drop' or 'rebuild' option of deduplicate_merge_projection_modeWithout running all migrations, the ClickHouse integration will not work.
To work around this issue and run the migrations:
Sign in to the Rails console.
Execute the following command:
ClickHouse::Client.execute("INSERT INTO schema_migrations (version) VALUES ('20231114142100'), ('20240115162101')", :main)Migrate the database again:
sudo gitlab-rake gitlab:clickhouse:migrate
This time the database migration should successfully finish.