Command Line Interface
dlt init <source> <destination>
This command creates new dlt pipeline script that loads data from
destination to it. When you run the command:
- It creates basic project structure if the current folder is empty. Adds
- It checks if
sourceargument is matching one of our verified sources and if it is so, it adds it to the project.
- If the
sourceis unknown it will use a generic template to get you started.
- It will rewrite the pipeline scripts to use your
- It will create sample config and credentials in
config.tomlfor the specified source and destination.
- It will create
requirements.txtwith dependencies required by source and destination. If one exists, it will print instructions what to add to it.
This command can be used several times in the same folders to add more sources, destinations and pipelines. It will also update the verified source code to the newest
version if run again with existing
source name. You are warned if files will be overwritten or if
dlt version needs upgrade to run particular pipeline.
Specify your own "verified sources" repository.
You can use
--location <repo_url or local folder> option to specify your own repository with sources. Typically you would fork ours and start customizing and adding sources ie. to use them for your team or organization. You can also specify a branch with
--branch <name> ie. to test a version being developed.
List all verified sources
dlt init --list-verified-sources
Shows all available verified sources and their short descriptions. For each source, checks if your local
dlt version requires update
and prints the relevant warning.
This command prepares your pipeline for deployment and gives you step by step instruction how to accomplish it. To enabled this functionality please first execute
pip install "dlt[cli]"
that will add additional packages to current environment.
💡 We ask you to install those dependencies separately to keep our core library small and make it work everywhere.
dlt deploy <script>.py github-action --schedule "*/30 * * * *"
GitHub Actions is a CI/CD runner that you can use basically for free.
You need to specify when the GitHub Action should run using a
cron schedule expression. The command also takes additional flags:
--run-on-push (default is False) and
--run-manually (default is True). Remember to put the cron
schedule into quotation marks as in the example above.
For the chess.com API example above, you could deploy it with
dlt deploy chess.py github-action --schedule "*/30 * * * *".
Follow the Deploy a pipeline with Github Actions walkthrough to learn more.
dlt deploy <script>.py airflow-composer
Google Composer is a managed Airflow environment provided by Google.
Follow the Deploy a pipeline with Airflow walkthrough to learn more.
It will create an Airflow DAG for your pipeline script that you should customize. The DAG is using
dlt Airflow wrapper to make this process trivial.
It displays the environment variables with secrets you must add to the Airflow.
You'll also get a cloudbuild file to sync the github repository with the
dag folder of your
Airflow Composer instance.
💡 The command target Composer users but generated DAG and instructions will work with any Airflow instance.
Use this command to inspect the pipeline working directory, tables and data in the destination and check for problems with the data loading.
Show tables and data in the destination
dlt pipeline <pipeline name> show
Generates and launches a simple Streamlit app that you can use to inspect
the schemas and data in the destination as well as your pipeline state and loading status / stats.
Should be executed from the same folder, from which you ran the pipeline script to access
destination credentials. Requires
streamlit to be installed.
Get the pipeline information
dlt pipeline <pipeline name> info
Displays content of the working directory of the pipeline: dataset name, destination, list of schemas, resources in schemas, list of completed and normalized load packages, and optionally a pipeline state set by the resources during extraction process.
Get the load package information
dlt pipeline <pipeline name> load-package <load id>
Shows information on a load package with given
load_id parameter defaults to the
most recent package. Package information includes its state (
COMPLETED/PROCESSED) and list of all
jobs in a package with their statuses, file sizes, types and in case of failed jobs—the error
messages from the destination. With verbose flag set
dlt pipeline -v ..., you can also see the
list of all tables and columns created at the destination during loading of that package.
List all failed jobs
dlt pipeline <pipeline name> failed-jobs
This commands scans all the load packages looking for failed jobs and then displays information on files that got loaded and the failure message from the destination.
Get the last run trace
dlt pipeline <pipeline name> trace
Displays the trace of last pipeline run containing the start data of the run, elapsed time and the
same information for all the steps (
load). If any of the steps failed,
you'll see message of the exceptions that caused that problem. Successful
will display the load info instead.
Sync pipeline with the destination
dlt pipeline <pipeline name> sync
This command will remove pipeline working directory with all pending packages, not synchronized state changes and schemas and retrieve the last synchronized data from the destination. If you drop the dataset the pipeline is loading to, this command results in a complete reset of pipeline state.
In case of a pipeline without working directory, the command may be used to create one from the
destination. In order to do that you need to pass the dataset name and destination name to the CLI
and provide the credentials to connect to destination (ie. in
.dlt/secrets.toml) placed in the
folder where you execute the
pipeline sync command.
Selectively drop tables and reset state
dlt pipeline <pipeline name> drop [resource_1] [resource_2]
Drops tables generated by selected resources and resets the state associated with them. Mainly used
to force a full refresh on selected tables. In example below we drop all tables generated by
repo_events resource in github pipeline:
dlt pipeline github_events drop repo_events
dlt will inform you on the names of dropped tables and the resource state slots that will be
About to drop the following data in dataset airflow_events_1 in destination dlt.destinations.duckdb:
Selected schema:: github_repo_events
Selected resource(s):: ['repo_events']
Table(s) to drop:: ['issues_event', 'fork_event', 'pull_request_event', 'pull_request_review_event', 'pull_request_review_comment_event', 'watch_event', 'issue_comment_event', 'push_event__payload__commits', 'push_event']
Resource(s) state to reset:: ['repo_events']
Source state path(s) to reset:: 
Do you want to apply these changes? [y/N]
As a result of the command above:
- All the indicated tables will be dropped in the destination. Note that
dltdrops the child tables as well.
- All the indicated tables will be removed from the indicated schema.
- The state for the resource
repo_eventswas found and will be reset.
- New schema and state will be stored in the destination.
drop command accepts several advanced settings:
- You can use regexes to select resources. Prepend
re:string to indicate regex pattern. Example below will select all resources starting with
dlt pipeline github_events drop "re:^repo"
- You can drop all tables in indicated schema:
dlt pipeline chess drop --drop-all
- You can indicate additional state slots to reset by passing JsonPath to source state. In example
below we reset the
archivesslot in source state:
dlt pipeline chess_pipeline drop --state-paths archives
This will select the
archives key in
❗ This command is still experimental and the interface will most probably change. Resetting the resource state assumes that the
dltstate layout is followed.
List all pipelines on the local machine
dlt pipeline --list-pipelines
This command lists all the pipelines executed on the local machine with their working data in the default pipelines folder.
Drop pending and partially loaded packages
dlt pipeline <pipeline name> drop-pending-packages
Removes all extracted and normalized packages in the pipeline's working dir.
dlt keeps extracted and normalized load packages in pipeline working directory. When
run method is called, it will attempt to normalize and load
pending packages first. The command above removes such packages. Note that pipeline state is not reverted to the state at which the deleted package
were created. Use
dlt pipeline ... sync is recommended if your destination supports state sync.
Show stack traces
If the command fails and you want to see the full stack trace add
--debug just after
dlt --debug pipeline github info