Run a pipeline
Follow the steps below to run your pipeline script, see your loaded data and tables, inspect pipeline state, trace and handle the most common problems.
1. Write and execute pipeline script
Once you have created a new pipeline or added and verified a source, you will want to use it to load data. You need to write (or customize) a pipeline script, like the one below that loads data from the chess.com API:
import dlt
if __name__ == "__main__":
pipeline = dlt.pipeline(pipeline_name="chess_pipeline", destination='duckdb', dataset_name="games_data")
# get data for a few famous players
data = chess_source(['magnuscarlsen', 'rpragchess'], start_month="2022/11", end_month="2022/12")
load_info = pipeline.run(data)
The run
method will extract data from the
chess API, normalize it into tables, and then
load it into duckdb
in the form of one or many load
packages. The run
method returns a load_info
object that, when printed, displays information
with pipeline and dataset names, ids of the load packages, and optionally, information on failed
jobs. Add the following line to your script:
print(load_info)
To get this printed:
Pipeline chess_pipeline completed in 1.80 seconds
1 load package(s) were loaded to destination duckdb and into dataset games_data
The duckdb destination used duckdb:////home/user-name/src/dlt_tests/dlt-cmd-test-3/chess_pipeline.duckdb location to store data
Load package 1679931001.985323 is COMPLETED and contains no failed jobs