External disks for storing data
Data processed in ClickHouse is usually stored in the local file system of the machine on which ClickHouse server is running. That requires large-capacity disks, which can be expensive. To avoid storing data locally, various storage options are supported:
- Amazon S3 object storage.
- Azure Blob Storage.
- Unsupported: The Hadoop Distributed File System (HDFS)
ClickHouse also has support for external table engines, which are different from
the external storage option described on this page, as they allow reading data
stored in some general file format (like Parquet). On this page we are describing
storage configuration for the ClickHouse MergeTree
family or Log
family tables.
- to work with data stored on
Amazon S3
disks, use the S3 table engine. - to work with data stored in Azure Blob Storage, use the AzureBlobStorage table engine.
- to work with data in the Hadoop Distributed File System (unsupported), use the HDFS table engine.
Configure external storage
MergeTree
and Log
family table engines can store data to S3
, AzureBlobStorage
, HDFS
(unsupported) using a disk with types s3
,
azure_blob_storage
, hdfs
(unsupported) respectively.
Disk configuration requires:
- A
type
section, equal to one ofs3
,azure_blob_storage
,hdfs
(unsupported),local_blob_storage
,web
. - Configuration of a specific external storage type.
Starting from 24.1 clickhouse version, it is possible to use a new configuration option. It requires specifying:
- A
type
equal toobject_storage
object_storage_type
, equal to one ofs3
,azure_blob_storage
(or justazure
from24.3
),hdfs
(unsupported),local_blob_storage
(or justlocal
from24.3
),web
.
Optionally, metadata_type
can be specified (it is equal to local
by default), but it can also be set to plain
, web
and, starting from 24.4
, plain_rewritable
.
Usage of plain
metadata type is described in plain storage section, web
metadata type can be used only with web
object storage type, local
metadata type stores metadata files locally (each metadata files contains mapping to files in object storage and some additional meta information about them).
For example:
is equal to the following configuration (from version 24.1
):
The following configuration:
is equal to:
An example of full storage configuration will look like:
Starting with version 24.1, it can also look like:
To make a specific kind of storage a default option for all MergeTree
tables,
add the following section to the configuration file:
If you want to configure a specific storage policy for a specific table, you can define it in settings while creating the table:
You can also use disk
instead of storage_policy
. In this case it is not necessary
to have the storage_policy
section in the configuration file, and a disk
section is enough.
Dynamic Configuration
There is also a possibility to specify storage configuration without a predefined
disk in configuration in a configuration file, but can be configured in the
CREATE
/ATTACH
query settings.
The following example query builds on the above dynamic disk configuration and shows how to use a local disk to cache data from a table stored at a URL.
The example below adds a cache to external storage.
In the settings highlighted below notice that the disk of type=web
is nested within
the disk of type=cache
.
The example uses type=web
, but any disk type can be configured as dynamic,
including local disk. Local disks require a path argument to be inside the
server config parameter custom_local_disks_base_directory
, which has no
default, so set that also when using local disk.
A combination of config-based configuration and sql-defined configuration is also possible:
where web
is from the server configuration file:
Using S3 Storage
Required parameters
Parameter | Description |
---|---|
endpoint | S3 endpoint URL in path or virtual hosted styles. Should include the bucket and root path for data storage. |
access_key_id | S3 access key ID used for authentication. |
secret_access_key | S3 secret access key used for authentication. |
Optional parameters
Parameter | Description | Default Value |
---|---|---|
region | S3 region name. | - |
support_batch_delete | Controls whether to check for batch delete support. Set to false when using Google Cloud Storage (GCS) as GCS doesn't support batch deletes. | true |
use_environment_credentials | Reads AWS credentials from environment variables: AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY , and AWS_SESSION_TOKEN if they exist. | false |
use_insecure_imds_request | If true , uses insecure IMDS request when obtaining credentials from Amazon EC2 metadata. | false |
expiration_window_seconds | Grace period (in seconds) for checking if expiration-based credentials have expired. | 120 |
proxy | Proxy configuration for S3 endpoint. Each uri element inside proxy block should contain a proxy URL. | - |
connect_timeout_ms | Socket connect timeout in milliseconds. | 10000 (10 seconds) |
request_timeout_ms | Request timeout in milliseconds. | 5000 (5 seconds) |
retry_attempts | Number of retry attempts for failed requests. | 10 |
single_read_retries | Number of retry attempts for connection drops during read. | 4 |
min_bytes_for_seek | Minimum number of bytes to use seek operation instead of sequential read. | 1 MB |
metadata_path | Local filesystem path to store S3 metadata files. | /var/lib/clickhouse/disks/<disk_name>/ |
skip_access_check | If true , skips disk access checks during startup. | false |
header | Adds specified HTTP header to requests. Can be specified multiple times. | - |
server_side_encryption_customer_key_base64 | Required headers for accessing S3 objects with SSE-C encryption. | - |
server_side_encryption_kms_key_id | Required headers for accessing S3 objects with SSE-KMS encryption. Empty string uses AWS managed S3 key. | - |
server_side_encryption_kms_encryption_context | Encryption context header for SSE-KMS (used with server_side_encryption_kms_key_id ). | - |
server_side_encryption_kms_bucket_key_enabled | Enables S3 bucket keys for SSE-KMS (used with server_side_encryption_kms_key_id ). | Matches bucket-level setting |
s3_max_put_rps | Maximum PUT requests per second before throttling. | 0 (unlimited) |
s3_max_put_burst | Maximum concurrent PUT requests before hitting RPS limit. | Same as s3_max_put_rps |
s3_max_get_rps | Maximum GET requests per second before throttling. | 0 (unlimited) |
s3_max_get_burst | Maximum concurrent GET requests before hitting RPS limit. | Same as s3_max_get_rps |
read_resource | Resource name for scheduling read requests. | Empty string (disabled) |
write_resource | Resource name for scheduling write requests. | Empty string (disabled) |
key_template | Defines object key generation format using re2 syntax. Requires storage_metadata_write_full_object_key flag. Incompatible with root path in endpoint . Requires key_compatibility_prefix . | - |
key_compatibility_prefix | Required with key_template . Specifies the previous root path from endpoint for reading older metadata versions. | - |
Google Cloud Storage (GCS) is also supported using the type s3
. See GCS backed MergeTree.
Using Plain Storage
In 22.10
a new disk type s3_plain
was introduced, which provides a write-once storage.
Configuration parameters for it are the same as for the s3
disk type.
Unlike the s3
disk type, it stores data as is. In other words,
instead of having randomly generated blob names, it uses normal file names
(the same way as ClickHouse stores files on local disk) and does not store any
metadata locally. For example, it is derived from data on s3
.
This disk type allows keeping a static version of the table, as it does not
allow executing merges on the existing data and does not allow inserting of new
data. A use case for this disk type is to create backups on it, which can be done
via BACKUP TABLE data TO Disk('plain_disk_name', 'backup_name')
. Afterward,
you can do RESTORE TABLE data AS data_restored FROM Disk('plain_disk_name', 'backup_name')
or use ATTACH TABLE data (...) ENGINE = MergeTree() SETTINGS disk = 'plain_disk_name'
.
Configuration:
Starting from 24.1
it is possible configure any object storage disk (s3
, azure
, hdfs
(unsupported), local
) using
the plain
metadata type.
Configuration:
Using S3 Plain Rewritable Storage
A new disk type s3_plain_rewritable
was introduced in 24.4
.
Similar to the s3_plain
disk type, it does not require additional storage for
metadata files. Instead, metadata is stored in S3.
Unlike the s3_plain
disk type, s3_plain_rewritable
allows executing merges
and supports INSERT
operations.
Mutations and replication of tables are not supported.
A use case for this disk type is for non-replicated MergeTree
tables. Although
the s3
disk type is suitable for non-replicated MergeTree
tables, you may opt
for the s3_plain_rewritable
disk type if you do not require local metadata
for the table and are willing to accept a limited set of operations. This could
be useful, for example, for system tables.
Configuration:
is equal to
Starting from 24.5
it is possible to configure any object storage disk
(s3
, azure
, local
) using the plain_rewritable
metadata type.
Using Azure Blob Storage
MergeTree
family table engines can store data to Azure Blob Storage
using a disk with type azure_blob_storage
.
Configuration markup:
Connection parameters
Parameter | Description | Default Value |
---|---|---|
storage_account_url (Required) | Azure Blob Storage account URL. Examples: http://account.blob.core.windows.net or http://azurite1:10000/devstoreaccount1 . | - |
container_name | Target container name. | default-container |
container_already_exists | Controls container creation behavior: - false : Creates a new container - true : Connects directly to existing container - Unset: Checks if container exists, creates if needed | - |
Authentication parameters (the disk will try all available methods and Managed Identity Credential):
Parameter | Description |
---|---|
connection_string | For authentication using a connection string. |
account_name | For authentication using Shared Key (used with account_key ). |
account_key | For authentication using Shared Key (used with account_name ). |
Limit parameters
Parameter | Description |
---|---|
s3_max_single_part_upload_size | Maximum size of a single block upload to Blob Storage. |
min_bytes_for_seek | Minimum size of a seekable region. |
max_single_read_retries | Maximum number of attempts to read a chunk of data from Blob Storage. |
max_single_download_retries | Maximum number of attempts to download a readable buffer from Blob Storage. |
thread_pool_size | Maximum number of threads for IDiskRemote instantiation. |
s3_max_inflight_parts_for_one_file | Maximum number of concurrent put requests for a single object. |
Other parameters
Parameter | Description | Default Value |
---|---|---|
metadata_path | Local filesystem path to store metadata files for Blob Storage. | /var/lib/clickhouse/disks/<disk_name>/ |
skip_access_check | If true , skips disk access checks during startup. | false |
read_resource | Resource name for scheduling read requests. | Empty string (disabled) |
write_resource | Resource name for scheduling write requests. | Empty string (disabled) |
metadata_keep_free_space_bytes | Amount of free metadata disk space to reserve. | - |
Examples of working configurations can be found in integration tests directory (see e.g. test_merge_tree_azure_blob_storage or test_azure_blob_storage_zero_copy_replication).
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
Using HDFS storage (Unsupported)
In this sample configuration:
- the disk is of type
hdfs
(unsupported) - the data is hosted at
hdfs://hdfs1:9000/clickhouse/
By the way, HDFS is unsupported and therefore there might be issues when using it. Feel free to make a pull request with the fix if any issue arises.
Keep in mind that HDFS may not work in corner cases.
Using Data Encryption
You can encrypt the data stored on S3, or HDFS (unsupported) external disks, or on a local disk. To turn on the encryption mode, in the configuration file you must define a disk with the type encrypted
and choose a disk on which the data will be saved. An encrypted
disk ciphers all written files on the fly, and when you read files from an encrypted
disk it deciphers them automatically. So you can work with an encrypted
disk like with a normal one.
Example of disk configuration:
For example, when ClickHouse writes data from some table to a file store/all_1_1_0/data.bin
to disk1
, then in fact this file will be written to the physical disk along the path /path1/store/all_1_1_0/data.bin
.
When writing the same file to disk2
, it will actually be written to the physical disk at the path /path1/path2/store/all_1_1_0/data.bin
in encrypted mode.
Required Parameters
Parameter | Type | Description |
---|---|---|
type | String | Must be set to encrypted to create an encrypted disk. |
disk | String | Type of disk to use for underlying storage. |
key | Uint64 | Key for encryption and decryption. Can be specified in hexadecimal using key_hex . Multiple keys can be specified using the id attribute. |
Optional Parameters
Parameter | Type | Default | Description |
---|---|---|---|
path | String | Root directory | Location on the disk where data will be saved. |
current_key_id | String | - | The key ID used for encryption. All specified keys can be used for decryption. |
algorithm | Enum | AES_128_CTR | Encryption algorithm. Options: - AES_128_CTR (16-byte key) - AES_192_CTR (24-byte key) - AES_256_CTR (32-byte key) |
Example of disk configuration:
Using local cache
It is possible to configure local cache over disks in storage configuration starting from version 22.3.
For versions 22.3 - 22.7 cache is supported only for s3
disk type. For versions >= 22.8 cache is supported for any disk type: S3, Azure, Local, Encrypted, etc.
For versions >= 23.5 cache is supported only for remote disk types: S3, Azure, HDFS (unsupported).
Cache uses LRU
cache policy.
Example of configuration for versions later or equal to 22.8:
Example of configuration for versions earlier than 22.8:
File Cache disk configuration settings:
These settings should be defined in the disk configuration section.
Parameter | Type | Default | Description |
---|---|---|---|
path | String | - | Required. Path to the directory where cache will be stored. |
max_size | Size | - | Required. Maximum cache size in bytes or readable format (e.g., 10Gi ). Files are evicted using LRU policy when the limit is reached. Supports ki , Mi , Gi formats (since v22.10). |
cache_on_write_operations | Boolean | false | Enables write-through cache for INSERT queries and background merges. Can be overridden per query with enable_filesystem_cache_on_write_operations . |
enable_filesystem_query_cache_limit | Boolean | false | Enables per-query cache size limits based on max_query_cache_size . |
enable_cache_hits_threshold | Boolean | false | When enabled, data is cached only after being read multiple times. |
cache_hits_threshold | Integer | 0 | Number of reads required before data is cached (requires enable_cache_hits_threshold ). |
enable_bypass_cache_with_threshold | Boolean | false | Skips cache for large read ranges. |
bypass_cache_threshold | Size | 256Mi | Read range size that triggers cache bypass (requires enable_bypass_cache_with_threshold ). |
max_file_segment_size | Size | 8Mi | Maximum size of a single cache file in bytes or readable format. |
max_elements | Integer | 10000000 | Maximum number of cache files. |
load_metadata_threads | Integer | 16 | Number of threads for loading cache metadata at startup. |
Note: Size values support units like
ki
,Mi
,Gi
, etc. (e.g.,10Gi
).
File Cache Query/Profile Settings
Setting | Type | Default | Description |
---|---|---|---|
enable_filesystem_cache | Boolean | true | Enables/disables cache usage per query, even when using a cache disk type. |
read_from_filesystem_cache_if_exists_otherwise_bypass_cache | Boolean | false | When enabled, uses cache only if data exists; new data won't be cached. |
enable_filesystem_cache_on_write_operations | Boolean | false (Cloud: true ) | Enables write-through cache. Requires cache_on_write_operations in cache config. |
enable_filesystem_cache_log | Boolean | false | Enables detailed cache usage logging to system.filesystem_cache_log . |
max_query_cache_size | Size | false | Maximum cache size per query. Requires enable_filesystem_query_cache_limit in cache config. |
skip_download_if_exceeds_query_cache | Boolean | true | Controls behavior when max_query_cache_size is reached: - true : Stops downloading new data - false : Evicts old data to make space for new data |
Cache configuration settings and cache query settings correspond to the latest ClickHouse version, for earlier versions something might not be supported.
Cache system tables
Table Name | Description | Requirements |
---|---|---|
system.filesystem_cache | Displays the current state of the filesystem cache. | None |
system.filesystem_cache_log | Provides detailed cache usage statistics per query. | Requires enable_filesystem_cache_log = true |
Cache commands
SYSTEM DROP FILESYSTEM CACHE (<cache_name>) (ON CLUSTER)
-- ON CLUSTER
This command is only supported when no <cache_name>
is provided
SHOW FILESYSTEM CACHES
Show a list of filesystem caches which were configured on the server.
(For versions less than or equal to 22.8
the command is named SHOW CACHES
)
DESCRIBE FILESYSTEM CACHE '<cache_name>'
Show cache configuration and some general statistics for a specific cache.
Cache name can be taken from SHOW FILESYSTEM CACHES
command. (For versions less
than or equal to 22.8
the command is named DESCRIBE CACHE
)
Cache current metrics | Cache asynchronous metrics | Cache profile events |
---|---|---|
FilesystemCacheSize | FilesystemCacheBytes | CachedReadBufferReadFromSourceBytes , CachedReadBufferReadFromCacheBytes |
FilesystemCacheElements | FilesystemCacheFiles | CachedReadBufferReadFromSourceMicroseconds , CachedReadBufferReadFromCacheMicroseconds |
CachedReadBufferCacheWriteBytes , CachedReadBufferCacheWriteMicroseconds | ||
CachedWriteBufferCacheWriteBytes , CachedWriteBufferCacheWriteMicroseconds |
Using static Web storage (read-only)
This is a read-only disk. Its data is only read and never modified. A new table
is loaded to this disk via ATTACH TABLE
query (see example below). Local disk
is not actually used, each SELECT
query will result in a http
request to
fetch required data. All modification of the table data will result in an
exception, i.e. the following types of queries are not allowed: CREATE TABLE
,
ALTER TABLE
, RENAME TABLE
,
DETACH TABLE
and TRUNCATE TABLE
.
Web storage can be used for read-only purposes. An example use is for hosting
sample data, or for migrating data. There is a tool clickhouse-static-files-uploader
,
which prepares a data directory for a given table (SELECT data_paths FROM system.tables WHERE name = 'table_name'
).
For each table you need, you get a directory of files. These files can be uploaded
to, for example, a web server with static files. After this preparation,
you can load this table into any ClickHouse server via DiskWeb
.
In this sample configuration:
- the disk is of type
web
- the data is hosted at
http://nginx:80/test1/
- a cache on local storage is used
Storage can also be configured temporarily within a query, if a web dataset is not expected to be used routinely, see dynamic configuration and skip editing the configuration file.
A demo dataset is hosted in GitHub. To prepare your own tables for web storage see the tool clickhouse-static-files-uploader
In this ATTACH TABLE
query the UUID
provided matches the directory name of the data, and the endpoint is the URL for the raw GitHub content.
A ready test case. You need to add this configuration to config:
And then execute this query:
Required parameters
Parameter | Description |
---|---|
type | web . Otherwise the disk is not created. |
endpoint | The endpoint URL in path format. Endpoint URL must contain a root path to store data, where they were uploaded. |
Optional parameters
Parameter | Description | Default Value |
---|---|---|
min_bytes_for_seek | The minimal number of bytes to use seek operation instead of sequential read | 1 MB |
remote_fs_read_backoff_threashold | The maximum wait time when trying to read data for remote disk | 10000 seconds |
remote_fs_read_backoff_max_tries | The maximum number of attempts to read with backoff | 5 |
If a query fails with an exception DB:Exception Unreachable URL
, then you can try to adjust the settings: http_connection_timeout, http_receive_timeout, keep_alive_timeout.
To get files for upload run:
clickhouse static-files-disk-uploader --metadata-path <path> --output-dir <dir>
(--metadata-path
can be found in query SELECT data_paths FROM system.tables WHERE name = 'table_name'
).
When loading files by endpoint
, they must be loaded into <endpoint>/store/
path, but config must contain only endpoint
.
If URL is not reachable on disk load when the server is starting up tables, then all errors are caught. If in this case there were errors, tables can be reloaded (become visible) via DETACH TABLE table_name
-> ATTACH TABLE table_name
. If metadata was successfully loaded at server startup, then tables are available straight away.
Use http_max_single_read_retries setting to limit the maximum number of retries during a single HTTP read.
Zero-copy Replication (not ready for production)
Zero-copy replication is possible, but not recommended, with S3
and HDFS
(unsupported) disks. Zero-copy replication means that if the data is stored remotely on several machines and needs to be synchronized, then only the metadata is replicated (paths to the data parts), but not the data itself.
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.