Blog /May 27, 2025

The REST API Integration costs: How AI + dlt is finally making it bearable

Aman Gupta,
Jr. Data Engineer

There's a hidden cost in every REST API integration. No, it's not on your cloud bill. You won't see it in a KPI dashboard either. There's no budget review or Jira epic tracking this.

It's the REST API integration overhead. This cost quietly steals your focus, one schema mismatch at a time.

Every time you want to move data from SaaS Tool A to Warehouse B, you pay it. And it's never a one-click checkout. It's an unavoidable detour involving pagination, unpredictable schema changes, and rate limiting.

The root cause is semi-structured JSON data (with unknown schemas), a weakly typed interchange format that clashes with strongly typed analytical destinations. Maintenance adds another layer: homebrewed scripts and custom code that's often messy, undocumented, and challenging to decipher.

This hidden cost isn't just about initial setup; it's the ongoing cognitive load required to manage continuous uncertainty.

Removing the cognitive load and complexity with dlt+ AI

Here’s an example how you can build faster, with less complexity and no tech debtbt - use a LLM to accelerate your development, and let dlt self-maintain the pipelines.

Here’s the step by step:

1. Refer to the dlt docs for setting up your cursor.

2. Run dlt ai setup cursor to auto-generate applicable project rules for REST API sources. It’s convenient for OpenAPI specs or legacy code.

You can review the rules in .cursor/rules or via Cursor Settings → Rules.

Here’s a short video description of how we did it.

Video link here!

3. Next, we used the following prompt:

Hey!

I need a dlt REST API pipeline built using this OpenAPI spec:

👉 https://fast-api-jaffle-shop-jz2mh.ondigitalocean.app/openapi.json

The base URL is:

https://jaffle-shop.dlthub.com/api/v1

Can you include the following endpoints in the pipeline?

- customers, orders, products, supplies, stores

A few things to note:

- Check how pagination works for each endpoint. Use curl to hit the endpoints and see what kind of paginator they're using.
- If there's a way to load data incrementally (like using a created_at or updated_at filter), definitely take advantage of it.
- I only need data for the first 10 days of January 2017, so the pipeline should pull just that slice.

Let’s call the final Python file my_pipeline.py.

Ping me if anything’s unclear or if an endpoint behaves weirdly.

4. With a few iterations, Cursor produced code that worked as expected:

Correct pagination logic based on the API.
Data was loaded for the specified time frame.
Incremental loading was correctly implemented on the orders endpoint

This is the code produced:

import dlt
from dlt.sources.rest_api import rest_api_source

# Define the base URL for the Jaffle Shop API
BASE_URL = "https://jaffle-shop.dlthub.com/api/v1"

# Configure the dlt REST API source
# No authentication required for this API
jaffle_shop_source = rest_api_source(
    {
        "client": {
            "base_url": BASE_URL,
            # Default pagination: Header Link Paginator
            # This is used by all endpoints in this API
            "paginator": {
                "type": "header_link",
                "links_next_key": "next",  # Default value, matches the API's Link header
            },
            # Default page size parameter used by the paginator - MOVED TO resource_defaults
        },
        "resource_defaults": {
            "primary_key": "id",  # Assuming 'id' is the primary key for most resources
            "write_disposition": "merge", # Merge on 'id'
            "endpoint": {
                # No data selector needed as data is the top-level array
                 "data_selector": "$", # Select the whole list
                 # page_param is handled by the header_link paginator internally
                 # Default request parameters for all resources
                 "params": {
                    "page_size": 100 # Default from OpenAPI spec
                 }
            }
        },
        "resources": [
            {
                "name": "customers",
                "endpoint": {
                    "path": "customers",
                    # Inherits default pagination and params
                }
            },
            {
                "name": "orders",
                "endpoint": {
                    "path": "orders",
                    "params": {
                        # Apply the specific date range filter for this resource
                        "start_date": "2017-01-01",
                        "end_date": "2017-01-10",
                         # Keep default page_size - inherited from resource_defaults now
                    },
                    # Incremental loading configuration for orders based on 'ordered_at'
                    # Uses the 'start_date' parameter which we also use for the initial filter
                    "incremental": {
                        "cursor_path": "ordered_at",
                        # Set initial value to the start of the desired range
                        # Note: The API might ignore this if start_date is also present?
                        # Let's keep it aligned with the filter for clarity.
                        "initial_value": "2017-01-01T00:00:00Z", # ISO format often preferred
                        "start_param": "start_date", # API parameter for filtering
                        # end_param could be 'end_date' but we are setting a fixed window initially
                    },
                }
            },
            {
                "name": "products",
                "primary_key": "sku", # Product identifier is 'sku' - MOVED HERE
                "endpoint": {
                    "path": "products",
                    # primary_key was moved up
                    # Inherits default pagination and params
                }
            },
            {
                "name": "supplies",
                "endpoint": {
                    "path": "supplies",
                    # Inherits default pagination and params
                }
            },
            {
                "name": "stores",
                 "endpoint": {
                    "path": "stores",
                    # While stores have 'opened_at', the API spec doesn't show filtering params
                    # So, no incremental loading configuration here.
                    # Inherits default pagination and params
                 }
            },
        ]
    }
)

if __name__ == "__main__":
    # Configure the pipeline
    pipeline = dlt.pipeline(
        pipeline_name="jaffle_shop_rest_api",
        destination="duckdb",  # Example destination, can be changed
        dataset_name="jaffle_data",
        progress="log", # Add progress logging
        dev_mode=True # Recommended for development
    )

    # Run the pipeline with the configured source
    load_info = pipeline.run(jaffle_shop_source)

    # Print the outcome
    print(load_info)

Conclusion

I've built dozens of REST API integrations over the years. The pagination debugging. The schema evolution headaches. The rate limiting dance.

Every single time, I told myself "this is just how it is with APIs."

But it's not.

With dlt and Cursor, I finally escaped the integration grind. What used to take days now takes minutes. What used to require deep API knowledge now works with a simple prompt.

This isn't just about faster development, it's about getting back to the work that actually matters. Building insights, not battling boilerplate.

Want to reclaim your time?

Start here: dlt + Cursor documentation
Get support: Join our Slack community and connect with other developers who've made the switch
Browse examples: 30+ verified connectors to and deployment guides to learn from

Stop grinding. Start building!

Grey box and outcome driven engineering

Fivetran vs dlt: Quickstart vs Endgame

Removing the cognitive load and complexity with dlt+ AILink icon

ConclusionLink icon

Removing the cognitive load and complexity with dlt+ AI

Conclusion