Tiered Storages in ClickHouse
The MergeTree table engine in ClickHouse supports tiered storage. See the documentation for Using Multiple Block Devices for Data Storage for details on setup and further explanation.
Quoting from the MergeTree documentation:
MergeTree family table engines can store data on multiple block devices. For example, it can be useful when the data of a certain table are implicitly split into “hot” and “cold”. The most recent data is regularly requested but requires only a small amount of space. On the contrary, the fat-tailed historical data is requested rarely.
When used with remote storage backends such as Amazon S3, this makes a very efficient storage scheme. It allows for storage policies, which allows data to be on local disks for a period of time and then move it to object storage.
An example configuration can look like this:
<storage_configuration>
<disks>
<fast_ssd>
<path>/mnt/fast_ssd/clickhouse/</path>
</fast_ssd>
<gcs>
<support_batch_delete>false</support_batch_delete>
<type>s3</type>
<endpoint>https://storage.googleapis.com/${BUCKET_NAME}/${ROOT_FOLDER}/</endpoint>
<access_key_id>${SERVICE_ACCOUNT_HMAC_KEY}</access_key_id>
<secret_access_key>${SERVICE_ACCOUNT_HMAC_SECRET}</secret_access_key>
<metadata_path>/var/lib/clickhouse/disks/gcs/</metadata_path>
</gcs>
...
</disks>
...
<policies>
<move_from_local_disks_to_gcs> <!-- policy name -->
<volumes>
<hot> <!-- volume name -->
<disk>fast_ssd</disk> <!-- disk name -->
</hot>
<cold>
<disk>gcs</disk>
</cold>
</volumes>
<move_factor>0.2</move_factor>
<!-- The move factor determines when to move data from hot volume to cold.
See ClickHouse docs for more details. -->
</moving_from_ssd_to_hdd>
....
</storage_configuration>
In this storage policy, two volumes are defined hot
and cold
. After the hot
volume is filled with occupancy of disk_size * move_factor
, the data is being moved to Google Cloud Storage (GCS).
If this storage policy is not the default, create tables by attaching the storage policies. For example:
CREATE TABLE key_value_table (
event_date Date,
key String,
value String,
) ENGINE = MergeTree
ORDER BY (key)
PARTITION BY toYYYYMM(event_date)
SETTINGS storage_policy = 'move_from_local_disks_to_gcs'
In this storage policy, the move happens implicitly. It is also possible to keep hot data on local disks for a fixed period of time and then move them as cold.
This approach is possible with Table TTLs, which are also available with MergeTree table engine.
The ClickHouse documentation shows this feature in detail, in the example of implementing a hot - warm - cold architecture.
You can take a similar approach for the example shown above. First, adjust the storage policy:
<storage_configuration>
...
<policies>
<local_disk_and_gcs> <!-- policy name -->
<volumes>
<hot> <!-- volume name -->
<disk>fast_ssd</disk> <!-- disk name -->
</hot>
<cold>
<disk>gcs</disk>
</cold>
</volumes>
</local_disk_and_gcs>
....
</storage_configuration>
Then create the table as:
CREATE TABLE another_key_value_table (
event_date Date,
key String,
value String,
) ENGINE = MergeTree
ORDER BY (key)
PARTITION BY toYYYYMM(event_date)
TTL
event_date TO VOLUME 'hot',
event_date + INTERVAL 1 YEAR TO VOLUME 'cold'
SETTINGS storage_policy = 'local_disk_and_gcs';
This creates the table so that data older than 1 year (evaluated against the
event_date
column) is moved to GCS. Such a storage policy can be helpful for append-only
tables (like audit events) where only the most recent data is accessed frequently.
You can drop the data altogether, which can be a regulatory requirement.
We don’t mention modifying TTLs in this guide, but that is possible as well. See ClickHouse documentation for modifying TTL for details.
Docs
Edit this page to fix an error or add an improvement in a merge request.
Create an issue to suggest an improvement to this page.
Product
Create an issue if there's something you don't like about this feature.
Propose functionality by submitting a feature request.
Feature availability and product trials
View pricing to see all GitLab tiers and features, or to upgrade.
Try GitLab for free with access to all features for 30 days.
Get help
If you didn't find what you were looking for, search the docs.
If you want help with something specific and could use community support, post on the GitLab forum.
For problems setting up or using this feature (depending on your GitLab subscription).
Request support