dlt: the data loading library for Python

4,000+: GitHub stars
5,000+: Community members
120+: Contributors
2M+: Downloads per month

OPEN SOURCE

data load tool (dlt): load data anywhere

dlt (data load tool) is an open source Python library that loads data from often messy data sources into well-structured, live datasets. It automates all your tedious data engineering tasks, with features like schema inference, data normalization and incremental loading.

Run anywhere

Run it where Python runs - on Airflow, serverless functions, notebooks. No external APIs, backends, or containers, scales on micro and large infra alike.

Automated maintenance

With schema inference and evolution and alerts, and with short declarative code, maintenance becomes simple.

Declarative

User-friendly, declarative interface that removes knowledge obstacles for beginners while empowering senior professionals.

Fully customizable

Customize our verified data sources, or any part of the code to suit your needs.

Verified Sources & Destinations

Our verified sources are the simplest way to get started with building your stack. Choose from any of our fully customizable 60+ pre-built sources, such as any SQL database, Google Sheets, Salesforce and others.

With our numerous destinations you can load data to a local database, warehouse or a data lake. Choose from Snowflake, Databricks and more.

Explore our ecosystem

Build custom sources

If dlt’s verified sources don’t fit your needs, you can build your own custom source using the REST API source if an API is available. Having a declarative configuration, you’ll save a lot of time on writing custom code. If no API is available, you can build a custom source from scratch in Python.

Read the REST API docs Read the docs for adding a verified source

{testimonial.author?.name} — Willi Müller
Co-Founder at Untitled Data Company

Sync your databases

Sync database tables from any 100+ database engines to warehouses, vector databases, files or into custom reverse etl functions. Benefit from schema inference and evolution, incremental loading, deduplication, scd2 materializations and more. Achieve the highest performance with pyarrow and connector-x extraction engines. Simply specify the connection string and the destination you want to sync the data to, and dlt (data load tool) will take care of the rest.

Get started with the SQL source

Sync your files

Use dlt (data load tool) to retrieve any files you have stored on S3, Azure, GCS and other buckets. Parse csv, parquet, json, pdf, xls and any other format efficiently. Process your data on the fly, with other features such as schema inference, incremental loading and composability with all machine learning libraries.

Do the same at the destination end: pick your file format and storage layout or use table formats like parquet, delta tables or iceberg to easily create your own data lakes.

Get started with the filesystem source

OpenAPI toolkit

Pull data from any API with an OpenAPI spec without writing any code. The OpenAPI toolkit generates dlt pipeline code to load the data into any destination of your choice.

View the GitHub

What they're saying

Join the community

Join the growing dltHub community

Become a Contributor Join our Slack