- +2,500
- GitHub stars
- +2,500
- Community members
- +75
- Contributors
- +700k
- Downloads per month
OPEN SOURCE
data load tool (dlt): load data anywhere
dlt is an open source Python library that loads data from often messy data sources into well-structured, live datasets. Use dlt to automate all your tedious data engineering tasks, with features like schema inference, data normalization and incremental loading.
Run anywhere
Run it where Python runs - on Airflow, serverless functions, notebooks. No external APIs, backends, or containers, scales on micro and large infra alike.
Automated maintenance
With schema inference and evolution and alerts, and with short declarative code, maintenance becomes simple.
Declarative
User-friendly, declarative interface that removes knowledge obstacles for beginners while empowering senior professionals.
Fully customizable
Customize our verified data sources, or any part of the code to suit your needs.
Verified Sources & Destinations
Our verified sources are the simplest way to get started with building your stack. Choose from any of our fully customizable 60+ pre-built sources, such as any SQL database, Google Sheets, Salesforce and others.
With our numerous destinations you can load data to a local database, warehouse or a data lake. Choose from Snowflake, Databricks and more.
Build custom sources
If dlt’s verified sources don’t fit your needs, you can build your own custom source using the REST API source if an API is available. Having a declarative configuration, you’ll save a lot of time on writing custom code. If no API is available, you can build a custom source from scratch in Python.
Sync your databases
Sync database tables from any 100+ database engines to warehouses, vector databases, files or into custom reverse etl functions. Benefit from schema inference and evolution, incremental loading, deduplication, scd2 materializations and more. Achieve the highest performance with pyarrow and connector-x extraction engines. Simply specify the connection string and the destination you want to sync the data to, dlt will take care of the rest.
Sync your files
Use dlt to retrieve any files you have stored on S3, Azure, GCS and other buckets. Parse csv, parquet, json, pdf, xls and any other format efficiently and benefit from all of dlt's features like schema inference, incremental loading and composability with all machine learning libraries to process your data on the fly.
Do the same at the destination end: pick your file format and storage layout or use table formats like parquet, delta tables or iceberg to easily create your own data lakes.
OpenAPI toolkit
Pull data from any API with an OpenAPI spec without writing any code. The OpenAPI toolkit generates dlt pipeline code to load the data into any destination of your choice.