dlt: the data loading library for Python

4,600+: GitHub stars
5,300+: Community members
140+: Contributors
3.6M+: Downloads per month

OPEN SOURCE

data load tool (dlt): load data anywhere

dlt (data load tool) is an open source Python library that loads data from often messy data sources into well-structured, live datasets. It automates all your tedious data engineering tasks, with features like schema inference, data normalization and incremental loading.

Run anywhere

Run it where Python runs - on Airflow, serverless functions, notebooks. No external APIs, backends, or containers, scales on micro and large infra alike.

Automated maintenance

With schema inference and evolution and alerts, and with short declarative code, maintenance becomes simple.

Declarative

User-friendly, declarative interface that removes knowledge obstacles for beginners while empowering senior professionals.

Fully customizable

Customize our verified data sources, or any part of the code to suit your needs.

Verified Sources & Destinations

Our verified sources are the simplest way to get started with building your stack. Choose from any of our fully customizable 60+ pre-built sources, such as any SQL database, Google Sheets, Salesforce and others.

With our numerous destinations you can load data to a local database, warehouse or a data lake. Choose from Snowflake, Databricks and more.

Explore our ecosystem

Build custom sources

If dlt’s verified sources don’t fit your needs, you can build your own custom source using the REST API source if an API is available. Having a declarative configuration, you’ll save a lot of time on writing custom code. If no API is available, you can build a custom source from scratch in Python.

Read the REST API docs Read the docs for adding a verified source

{testimonial.author?.name} — Willi Müller
Co-Founder at Untitled Data Company

Sync your databases

Sync database tables from any 100+ database engines to warehouses, vector databases, files or into custom reverse etl functions. Benefit from schema inference and evolution, incremental loading, deduplication, scd2 materializations and more. Achieve the highest performance with pyarrow and connector-x extraction engines. Simply specify the connection string and the destination you want to sync the data to, and dlt (data load tool) will take care of the rest.

Get started with the SQL source

Sync your files

Use dlt (data load tool) to retrieve any files you have stored on S3, Azure, GCS and other buckets. Parse csv, parquet, json, pdf, xls and any other format efficiently. Process your data on the fly, with other features such as schema inference, incremental loading and composability with all machine learning libraries.

Do the same at the destination end: pick your file format and storage layout or use table formats like parquet, delta tables or iceberg to easily create your own data lakes.

Get started with the filesystem source

OpenAPI toolkit

Pull data from any API with an OpenAPI spec without writing any code. The OpenAPI toolkit generates dlt pipeline code to load the data into any destination of your choice.

View the GitHub

What they're saying

Join the community

Josh Hanson
Founding Data Scientist @ Clay
With modern frameworks like Dagster Labs, dbt Labs, and dltHub — plus the accelerating power of AI — it’s increasingly feasible for a small but mighty data team to build, maintain, and scale an entire data platform.
These tools abstract the heavy-lift across extract, load, transform, and orchestration, giving small teams enterprise-grade scaffolding out of the box.
Pair that foundation with AI-assisted development, and a savvy data person can spin up a highly scalable, fully custom stack with a fraction of yesterday’s effort.

Don Bosco van Hoi
Co-Founder / Owner @ Mothership GmbH
#dlt might be the next and only tool you might need for data loading.

I have been working with #dlt for about a year now, so basically since it has been in a very early stage and only have positive things to say about it.

While it may not solve any loading problem in data engineering, it solves most common use cases. I guarantee that there is no other tool in the market available, that solves your integration challenges like dlt does it.

Yuki Kakegawa
Staff Data Engineer @ Jump
I started using dlt last year and never looked back. The only time I’d use custom Python scripts for ingesting data is when in testing or development that I just need to look at the data or test the source API endpoint. Otherwise, dlt does a great job on standardizing the way you write your ingestion scripts, which is very much needed for any production pipelines.

Hugo Lu
Founder @ Orchestra
Given my rusty python I was able to set-up a Hubspot to BigQuery sync using dlt in a couple of hours. While there were some new concepts to understand and a bit of documentation digging, the functionality is pretty great.

Alejandro González Bueno
Senior Data Engineer / Data Architect @ Cívica
This summer, I’ve had the chance to work a lot with dltHub, and I have to say it’s a tool that really impressed me. It’s very simple and flexible, which makes it a great choice for many data engineering projects.

Vladimir Popukaylo
Head of Data - PhD in Computer Science
If you’ve experienced difficulties with Airbyte or been concerned about the cost of Fivetran, you might want to take a look at dlt (Data Load Tool from dltHub).

I’ve recently spent some time experimenting with dlt, and it’s a simple yet highly effective open-source solution for building scalable ETL pipelines. Designed with data engineers in mind, dlt offers flexibility and practicality without unnecessary complexity. I found dlt to be incredibly useful for a variety of tasks, ranging from CDC replication of Postgres databases to capturing data from various APIs and CRMs like Salesforce in a unified and efficient manner.

Martin Salo
Founder & CTO @ Tensor Estate
ETL cost down 182x per month, sync time improved 10x using @modal_labs and @dltHub and dropping @fivetran for our ERP pipeline

Carlos Leyson
Data Lead @ Moffin
The more I get into dlt the more I like it! It sometimes feel like cheating

Luca Milan
Head of DevOps and Automation @ Datlas Spa
With a declarative and modular approach, dlthub helped us streamline our data ingestion pipelines and build a centralized data hub in a matter of days.
We continuously ingest thousands of records from 20+ heterogeneous sources (REST/SOAP APIs, Excel, text files), applying incremental transformations and validations across domains/flows.
Thanks to dltHub, we standardized our ETL workflows, made schema evolution seamless, and gained full observability and lineage across our pipelines.

Arjun Anandkumar
Senior Data Platform Engineer @ Norlys
Every time a new requirement comes up, I am quite amazed that the dlthub team have thought about these scenarios and built possibilities for custom logic to be implemented where required. After all, it’s just python code, and as long as we feed the data to dlt in a way it can work with, it can run pretty much anything. Kudos, dlthub.com!

Jarich Braeckevelt
Lead Data Architect @ Hawaiian
dltHub has a good thing going! 🚀
Next step, setting up dbt Labs on top of the DuckDB data extracted and loaded by dltHub. It's pretty amazing to see that data analists / data scientists can do more and more of the ETL work due to great open-source tools.

Erfan Hesami
Data Engineer @ Airmaster
I first came across dlt through a colleague’s recommendation and immediately loved it. This lightweight tool runs anywhere Python does, making it very easy to set up. For our team, working on a Greenfield project with a limited budget, that flexibility was a huge advantage.

The documentation is clear, the Slack community support is outstanding, and the workshops from dltHub have been incredibly helpful. I highly recommend checking out this tool; it’s an excellent way to apply data engineering best practices while keeping full control over your pipelines.

Join the growing dltHub community

Become a Contributor Join our Slack