Skip to main content
Version: devel View Markdown

Initialize a pipeline

This guide walks you through creating and initializing a dlt pipeline in dltHub Workspace—whether manually, with agentic help, or from one of the verified sources maintained by dltHub team.

Overview

A dlt pipeline moves data from a source (like an API or database) into a destination (like DuckDB, Snowflake, or Iceberg). Initializing a pipeline is the first step in the data workflow. You can create one in two CLI-based ways:

MethodCommandBest for
Manualdlthub pipeline init <source> <destination>Developers who prefer manual setup
Verified sourcedlthub pipeline init <verified_source> <destination>Prebuilt, tested connectors from the community and dltHub team

Outside of a workspace (plain OSS dlt), the same scaffold is reachable as dlt init <source> <destination>. Inside a dltHub workspace, dlthub pipeline init is the canonical entry point—it adds the pipeline to the current workspace.

Step 0: Install dlt with workspace support

Before you start, make sure you followed the installation instructions and have a dltHub workspace initialized. The fastest way is:

uvx dlthub-start@latest

This scaffolds a workspace with .dlt/.workspace already set, the AI toolkits vendored, and dlt[hub] synced. See the installation guide for the alternative paths (adding to an existing project, or enabling workspace mode by hand).

dltHub Workspace is a unified environment for developing, running, and maintaining data pipelines—from local development to production.

More about dlt Workspace

Step 1: Initialize a custom pipeline

Manual setup (standard workflow)

A lightweight, code-first approach ideal for developers comfortable with Python.

dlthub pipeline init {source_name} duckdb

for example:

dlthub pipeline init my_github_pipeline duckdb

It scaffolds the pipeline template—a minimal starter project with a single Python script that shows three quick ways to load data into DuckDB using dlt:

  • fetch JSON from a public REST API (chess.com as an example) with requests,
  • read a public CSV with pandas, and
  • pull rows from a SQL database via SQLAlchemy.

The file also includes an optional GitHub REST client example (a @dlt.resource + @dlt.source) that can use a token from .dlt/secrets.toml, but will work unauthenticated at low rate limits. It’s meant as a hands-on playground you can immediately run and then adapt into a real pipeline.

Learn how to build you own dlt pipeline with dlt Fundamentals course.

Agentic setup

A collaborative AI-human workflow that integrates dlt with AI editors and agents like:

Start with the /find-source skill to describe your data source in natural language—the assistant identifies a verified source or researches the API, then chains into pipeline scaffolding.

Read more about running a pipeline

Next steps: Deploy and scale

Once your pipeline runs locally:

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.