Loading Data from ClickHouse Cloud
to Snowflake
using dlt
in Python
Join our Slack community or book a call with our support engineer Violetta.
ClickHouse Cloud
is a high-performance, scalable cloud-based data warehousing solution designed for real-time analytics. It enables businesses to run complex queries on large datasets with exceptional speed and efficiency. Snowflake
is a cloud-based data warehousing platform designed to enable the storage, processing, and analysis of large volumes of data. This documentation provides a guide to loading data from ClickHouse Cloud
to Snowflake
using the open-source Python library dlt
. With dlt
, users can seamlessly manage data transfers, ensuring efficient and secure data handling. For more information about ClickHouse Cloud
, visit clickhouse.com/cloud.
dlt
Key Features
- Snowflake Integration: Seamlessly integrate with Snowflake for data warehousing. Learn more
- Authentication Options: Supports multiple authentication methods including password, key pair, and external authentication. Learn more
- Staging Support: Utilize S3, GCS, or Azure Blob Storage for staging data before loading into Snowflake. Learn more
- Setup Guide: Step-by-step instructions to set up your Snowflake environment. Learn more
- Governance Features: Robust governance support through metadata, schema enforcement, and schema change alerts. Learn more
Getting started with your pipeline locally
dlt-init-openapi
0. Prerequisites
dlt
and dlt-init-openapi
requires Python 3.9 or higher. Additionally, you need to have the pip
package manager installed, and we recommend using a virtual environment to manage your dependencies. You can learn more about preparing your computer for dlt in our installation reference.
1. Install dlt and dlt-init-openapi
First you need to install the dlt-init-openapi
cli tool.
pip install dlt-init-openapi
The dlt-init-openapi
cli is a powerful generator which you can use to turn any OpenAPI spec into a dlt
source to ingest data from that api. The quality of the generator source is dependent on how well the API is designed and how accurate the OpenAPI spec you are using is. You may need to make tweaks to the generated code, you can learn more about this here.
# generate pipeline
# NOTE: add_limit adds a global limit, you can remove this later
# NOTE: you will need to select which endpoints to render, you
# can just hit Enter and all will be rendered.
dlt-init-openapi clickhouse_cloud --url https://raw.githubusercontent.com/dlt-hub/openapi-specs/main/open_api_specs/Business/click_house_cloud.yaml --global-limit 2
cd clickhouse_cloud_pipeline
# install generated requirements
pip install -r requirements.txt
The last command will install the required dependencies for your pipeline. The dependencies are listed in the requirements.txt
:
dlt>=0.4.12
You now have the following folder structure in your project:
clickhouse_cloud_pipeline/
├── .dlt/
│ ├── config.toml # configs for your pipeline
│ └── secrets.toml # secrets for your pipeline