Skip to main content

Introduction

dlt pacman

What is dlt?

dlt is an open-source library that you can add to your Python scripts to load data from various and often messy data sources into well-structured, live datasets. To get started, install it with:

pip install dlt

Unlike other solutions, with dlt, there's no need to use any backends or containers. Simply import dlt in a Python file or a Jupyter Notebook cell, and create a pipeline to load data into any of the supported destinations. You can load data from any source that produces Python data structures, including APIs, files, databases, and more. dlt also supports building a custom destination, which you can use as reverse ETL.

The library will create or update tables, infer data types, and handle nested data automatically. Here are a few example pipelines:

import dlt
from dlt.sources.helpers import requests

# Create a dlt pipeline that will load
# chess player data to the DuckDB destination
pipeline = dlt.pipeline(
pipeline_name="chess_pipeline", destination="duckdb", dataset_name="player_data"
)
# Grab some player data from Chess.com API
data = []
for player in ["magnuscarlsen", "rpragchess"]:
response = requests.get(f"https://api.chess.com/pub/player/{player}")
response.raise_for_status()
data.append(response.json())
# Extract, normalize, and load the data
load_info = pipeline.run(data, table_name="player")

Copy this example to a file or a Jupyter Notebook and run it. To make it work with the DuckDB destination, you'll need to install the duckdb dependency (the default dlt installation is really minimal):

pip install "dlt[duckdb]"

Now run your Python file or Notebook cell.

How it works? The library extracts data from a source (here: chess.com REST API), inspects its structure to create a schema, structures, normalizes, and verifies the data, and then loads it into a destination (here: duckdb, into a database schema player_data and table name player).

Why use dlt?

  • Automated maintenance - with schema inference and evolution and alerts, and with short declarative code, maintenance becomes simple.
  • Run it where Python runs - on Airflow, serverless functions, notebooks. No external APIs, backends, or containers, scales on micro and large infra alike.
  • User-friendly, declarative interface that removes knowledge obstacles for beginners while empowering senior professionals.

Getting started with dlt

  1. Dive into our Getting started guide for a quick intro to the essentials of dlt.
  2. Play with the Google Colab demo. This is the simplest way to see dlt in action.
  3. Read the Tutorial to learn how to build a pipeline that loads data from an API.
  4. Check out the How-to guides for recipes on common use cases for creating, running, and deploying pipelines.
  5. Ask us on Slack if you have any questions about use cases or the library.

Join the dlt community

  1. Give the library a ⭐ and check out the code on GitHub.
  2. Ask questions and share how you use the library on Slack.
  3. Report problems and make feature requests here.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.