Skip to main content

Create a pipeline

Follow the steps below to create a pipeline from the WeatherAPI.com API to DuckDB from scratch. The same steps can be repeated for any source and destination of your choice—use dlt init <source> <destination> and then build the pipeline for that API instead.

Please make sure you have installed dlt before following the steps below.

1. Initialize project

Create a new empty directory for your dlt project by running:

mkdir weatherapi_duckdb && cd weatherapi_duckdb

Start a dlt project with a pipeline template that loads data to DuckDB by running:

dlt init weatherapi duckdb

Install the dependencies necessary for DuckDB:

pip install -r requirements.txt

2. Add WeatherAPI.com API credentials

You will need to sign up for the WeatherAPI.com API.

Once you do this, you should see your API Key at the top of your user page.

Copy the value of the API key into .dlt/secrets.toml:

[sources]
api_secret_key = '<api key value>'

The secret name corresponds to the argument name in the source function. Below api_secret_key will get its value from secrets.toml when weatherapi_source() is called.

@dlt.source
def weatherapi_source(api_secret_key=dlt.secrets.value):
...

Run the weatherapi.py pipeline script to test that authentication headers look fine:

python3 weatherapi.py

Your API key should be printed out to stdout along with some test data.

3. Request data from the WeatherAPI.com API

Replace the definition of the weatherapi_resource function definition in the weatherapi.py pipeline script with a call to the WeatherAPI.com API:

@dlt.resource(write_disposition="append")
def weatherapi_resource(api_secret_key=dlt.secrets.value):
url = "https://api.weatherapi.com/v1/current.json"
params = {
"q": "NYC",
"key": api_secret_key
}
response = requests.get(url, params=params)
response.raise_for_status()
yield response.json()

Run the weatherapi.py pipeline script to test that the API call works:

python3 weatherapi.py

This should print out the weather in New York City right now.

4. Load the data

Remove the exit() call from the main function in weatherapi.py, so that running the python3 weatherapi.py command will now also run the pipeline:

if __name__=='__main__':

# configure the pipeline with your destination details
pipeline = dlt.pipeline(
pipeline_name='weatherapi',
destination='duckdb',
dataset_name='weatherapi_data'
)

# print credentials by running the resource
data = list(weatherapi_resource())

# print the data yielded from resource
print(data)

# run the pipeline with your parameters
load_info = pipeline.run(weatherapi_source())

# pretty print the information on data that was loaded
print(load_info)

Run the weatherapi.py pipeline script to load data into DuckDB:

python3 weatherapi.py

Then this command to see that the data loaded:

dlt pipeline weatherapi show

This will open a Streamlit app that gives you an overview of the data loaded.

5. Next steps

Now that you have a working pipeline, you have options for what to learn next:

Create a pipeline with GPT-4

Create a pipeline with GPT-4

Create dlt pipeline using the data source of your liking and let the GPT-4 write the resource functions and help you to debug the code.
This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.