dltHub
+2,500
GitHub stars
+3,000
Community members
+80
Contributors
+1M
Downloads per month
OPEN SOURCE

data load tool (dlt): load data anywhere

dlt is an open source Python library that loads data from often messy data sources into well-structured, live datasets. Use dlt to automate all your tedious data engineering tasks, with features like schema inference, data normalization and incremental loading.

Run anywhere

Run it where Python runs - on Airflow, serverless functions, notebooks. No external APIs, backends, or containers, scales on micro and large infra alike.

Automated maintenance

With schema inference and evolution and alerts, and with short declarative code, maintenance becomes simple.

Declarative

User-friendly, declarative interface that removes knowledge obstacles for beginners while empowering senior professionals.

Fully customizable

Customize our verified data sources, or any part of the code to suit your needs.

Verified Sources & Destinations

Our verified sources are the simplest way to get started with building your stack. Choose from any of our fully customizable 60+ pre-built sources, such as any SQL database, Google Sheets, Salesforce and others.

With our numerous destinations you can load data to a local database, warehouse or a data lake. Choose from Snowflake, Databricks and more.

Build custom sources

If dlt’s verified sources don’t fit your needs, you can build your own custom source using the REST API source if an API is available. Having a declarative configuration, you’ll save a lot of time on writing custom code. If no API is available, you can build a custom source from scratch in Python.

Quotation mark icon{testimonial.author?.name}

We at Untitled Data Company create new data pipelines for our customers all the time. We are now using dlt's new REST API toolkit in consulting projects. The toolkit allows us to build data pipelines in very little time and very little code. This proves to be great for us as well as our customers as they can easily maintain the data pipelines by anyone who knows Python on their end.

Quotation mark icon
{testimonial.author?.name}
Willi Müller
Co-Founder at Untitled Data Company

Sync your databases

Sync database tables from any 100+ database engines to warehouses, vector databases, files or into custom reverse etl functions. Benefit from schema inference and evolution, incremental loading, deduplication, scd2 materializations and more. Achieve the highest performance with pyarrow and connector-x extraction engines. Simply specify the connection string and the destination you want to sync the data to, dlt will take care of the rest.

Sync your files

Use dlt to retrieve any files you have stored on S3, Azure, GCS and other buckets. Parse csv, parquet, json, pdf, xls and any other format efficiently and benefit from all of dlt's features like schema inference, incremental loading and composability with all machine learning libraries to process your data on the fly.

Do the same at the destination end: pick your file format and storage layout or use table formats like parquet, delta tables or iceberg to easily create your own data lakes.

OpenAPI toolkit

Pull data from any API with an OpenAPI spec without writing any code. The OpenAPI toolkit generates dlt pipeline code to load the data into any destination of your choice.

Join the growing dltHub community