Skip to main content

Load Data from Trello to The Local Filesystem Using dlt in Python

Need help deploying these pipelines, or figuring out how to run them in your data stack?

Join our Slack community or book a call with our support engineer Violetta.

trello is a visual tool that empowers your team to manage any type of project, workflow, or task tracking. You can add files, checklists, or even automation to streamline your processes. With the open-source python library dlt, you can load data from trello to the local filesystem. The local filesystem destination stores data in a local folder, allowing you to easily create datalakes. You can store data as JSONL, Parquet, or CSV. For more information about trello, visit their website.

dlt Key Features

  • Pipeline Metadata: dlt pipelines utilize metadata to provide governance capabilities, including load IDs for data lineage and traceability. Learn more.
  • Schema Enforcement and Curation: dlt allows users to enforce and curate schemas, ensuring data consistency and quality. Learn more.
  • Schema Evolution: Get notified about schema changes in source data, allowing for proactive governance. Learn more.
  • Scaling and Finetuning: dlt offers mechanisms and configuration options to scale and fine-tune pipelines, such as parallel processing and memory buffer adjustments. Learn more.
  • Provider Key Formats: dlt translates standard formats into provider-specific formats, supporting both TOML and environment variables. Learn more.

Getting started with your pipeline locally

OpenAPI Source Generator dlt-init-openapi

This walkthrough makes use of the dlt-init-openapi generator cli tool. You can read more about it here. The code generated by this tool uses the dlt rest_api verified source, docs for this are here.

0. Prerequisites

dlt and dlt-init-openapi requires Python 3.9 or higher. Additionally, you need to have the pip package manager installed, and we recommend using a virtual environment to manage your dependencies. You can learn more about preparing your computer for dlt in our installation reference.

1. Install dlt and dlt-init-openapi

First you need to install the dlt-init-openapi cli tool.

pip install dlt-init-openapi

The dlt-init-openapi cli is a powerful generator which you can use to turn any OpenAPI spec into a dlt source to ingest data from that api. The quality of the generator source is dependent on how well the API is designed and how accurate the OpenAPI spec you are using is. You may need to make tweaks to the generated code, you can learn more about this here.

# generate pipeline
# NOTE: add_limit adds a global limit, you can remove this later
# NOTE: you will need to select which endpoints to render, you
# can just hit Enter and all will be rendered.
dlt-init-openapi trello --url https://raw.githubusercontent.com/dlt-hub/openapi-specs/main/open_api_specs/Business/trello.yaml --global-limit 2
cd trello_pipeline
# install generated requirements
pip install -r requirements.txt

The last command will install the required dependencies for your pipeline. The dependencies are listed in the requirements.txt:

dlt>=0.4.12

You now have the following folder structure in your project:

trello_pipeline/
├── .dlt/
│ ├── config.toml # configs for your pipeline
│ └── secrets.toml # secrets for your pipeline
├── rest_api/ # The rest api verified source
│ └── ...
├── trello/
│ └── __init__.py # TODO: possibly tweak this file
├── trello_pipeline.py # your main pipeline script
├── requirements.txt # dependencies for your pipeline
└── .gitignore # ignore files for git (not required)

1.1. Tweak trello/__init__.py

This file contains the generated configuration of your rest_api. You can continue with the next steps and leave it as is, but you might want to come back here and make adjustments if you need your rest_api source set