Loading Data from attio
to aws s3
Using dlt
in Python
Join our Slack community or book a call with our support engineer Violetta.
Loading data from Attio
to AWS S3
using the open-source Python library dlt
allows teams to efficiently manage and organize their workspaces. Attio
is a collaborative workspace designed for teams to manage relationships, track deals, and structure their activities. On the other hand, AWS S3
is a robust filesystem destination that stores data in various formats like JSONL, Parquet, or CSV, facilitating the creation of data lakes. By leveraging dlt
, users can seamlessly transfer data from Attio
to AWS S3
, ensuring streamlined data management and accessibility. More details about Attio
can be found here.
dlt
Key Features
- Easy to get started:
dlt
is a Python library that is easy to use and understand. It is designed to be simple to use and easy to understand. Typepip install dlt
and you are ready to go. Read more - Pipeline Metadata: Leverage metadata to provide governance capabilities, including tracking data loads and facilitating data lineage and traceability. Learn more
- Schema Enforcement and Curation: Enforce and curate schemas to ensure data consistency and quality, maintaining data integrity and facilitating standardized data handling practices. Learn more
- Scaling and Finetuning: Offers several mechanisms and configuration options to scale up and finetune pipelines, including parallel execution and memory buffer adjustments. Read more
- Governance Support: Robust governance support through pipeline metadata utilization, schema enforcement and curation, and schema change alerts. Learn more
Getting started with your pipeline locally
dlt-init-openapi
0. Prerequisites
dlt
and dlt-init-openapi
requires Python 3.9 or higher. Additionally, you need to have the pip
package manager installed, and we recommend using a virtual environment to manage your dependencies. You can learn more about preparing your computer for dlt in our installation reference.
1. Install dlt and dlt-init-openapi
First you need to install the dlt-init-openapi
cli tool.
pip install dlt-init-openapi
The dlt-init-openapi
cli is a powerful generator which you can use to turn any OpenAPI spec into a dlt
source to ingest data from that api. The quality of the generator source is dependent on how well the API is designed and how accurate the OpenAPI spec you are using is. You may need to make tweaks to the generated code, you can learn more about this here.
# generate pipeline
# NOTE: add_limit adds a global limit, you can remove this later
# NOTE: you will need to select which endpoints to render, you
# can just hit Enter and all will be rendered.
dlt-init-openapi attio --url https://raw.githubusercontent.com/dlt-hub/openapi-specs/main/open_api_specs/Business/attio_api.yaml --global-limit 2
cd attio_pipeline
# install generated requirements
pip install -r requirements.txt
The last command will install the required dependencies for your pipeline. The dependencies are listed in the requirements.txt
:
dlt>=0.4.12
You now have the following folder structure in your project:
attio_pipeline/
├── .dlt/
│ ├── config.toml # configs for your pipeline
│ └── secrets.toml # secrets for your pipeline
├── rest_api/ # The rest api verified source
│ └── ...
├── attio/