Loading Data from Zuora to Neon Serverless Postgres with dlt in Python
We will be using the dlt PostgreSQL destination to connect to Neon Serverless Postgres. You can get the connection string for your Neon Serverless Postgres database as described in the Neon Serverless Postgres Docs.
Join our Slack community or book a call with our support engineer Violetta.
This documentation provides a detailed guide for loading data from Zuora to Neon Serverless Postgres using the dlt library. Zuora is a subscription management platform that helps businesses manage their subscription-based services. Neon Serverless Postgres is a serverless platform for the PostgreSQL database, designed to facilitate the development of reliable and scalable applications. The open-source dlt library simplifies the process of data extraction, transformation, and loading (ETL). This guide will walk you through the steps required to set up and execute a data pipeline, ensuring efficient data handling and integration between Zuora and Neon Serverless Postgres. For more details on Zuora, please visit Zuora's website.
dlt Key Features
- Pipeline Metadata:
dltpipelines leverage metadata to provide governance capabilities. This metadata includes load IDs, which consist of a timestamp and pipeline name. Load IDs enable incremental transformations and data vaulting by tracking data loads and facilitating data lineage and traceability. Read more - Schema Enforcement and Curation:
dltempowers users to enforce and curate schemas, ensuring data consistency and quality. Schemas define the structure of normalized data and guide the processing and loading of data. Read more - Schema Evolution:
dltenables proactive governance by alerting users to schema changes, notifying stakeholders to review and validate changes, update downstream processes, or perform impact analysis. Read more - Scalability via Iterators, Chunking, and Parallelization:
dltoffers scalable data extraction by leveraging iterators, chunking, and parallelization techniques, enabling efficient processing of large datasets. Read more - Authentication Types: Snowflake destination accepts three authentication types: password authentication, key pair authentication, and external authentication. Read more
Getting started with your pipeline locally
dlt-init-openapi0. Prerequisites
dlt and dlt-init-openapi requires Python 3.9 or higher. Additionally, you need to have the pip package manager installed, and we recommend using a virtual environment to manage your dependencies. You can learn more about preparing your computer for dlt in our installation reference.
1. Install dlt and dlt-init-openapi
First you need to install the dlt-init-openapi cli tool.
pip install dlt-init-openapi
The dlt-init-openapi cli is a powerful generator which you can use to turn any OpenAPI spec into a dlt source to ingest data from that api. The quality of the generator source is dependent on how well the API is designed and how accurate the OpenAPI spec you are using is. You may need to make tweaks to the generated code, you can learn more about this here.
# generate pipeline
# NOTE: add_limit adds a global limit, you can remove this later
# NOTE: you will need to select which endpoints to render, you
# can just hit Enter and all will be rendered.
dlt-init-openapi zuora --url https://raw.githubusercontent.com/dlt-hub/openapi-specs/main/open_api_specs/Business/zuora.yaml --global-limit 2
cd zuora_pipeline
# install generated requirements
pip install -r requirements.txt
The last command will install the required dependencies for your pipeline. The dependencies are listed in the requirements.txt:
dlt>=0.4.12
You now have the following folder structure in your project:
zuora_pipeline/
├── .dlt/
│ ├── config.toml # configs for your pipeline
│ └── secrets.toml # secrets for your pipeline
├── rest_api/ # The rest api verified source
│ └── ...
├── zuora/
│ └── __init__.py # TODO: possibly tweak this file
├── zuora_pipeline.py # your main pipeline script