The REST API Integration costs: How AI + dlt is finally making it bearable
- Aman Gupta,
Jr. Data Engineer
There's a hidden cost in every REST API integration. No, it's not on your cloud bill. You won't see it in a KPI dashboard either. There's no budget review or Jira epic tracking this.
It's the REST API integration overhead. This cost quietly steals your focus, one schema mismatch at a time.
Every time you want to move data from SaaS Tool A to Warehouse B, you pay it. And it's never a one-click checkout. It's an unavoidable detour involving pagination, unpredictable schema changes, and rate limiting.
The root cause is semi-structured JSON data (with unknown schemas), a weakly typed interchange format that clashes with strongly typed analytical destinations. Maintenance adds another layer: homebrewed scripts and custom code that's often messy, undocumented, and challenging to decipher.
This hidden cost isn't just about initial setup; it's the ongoing cognitive load required to manage continuous uncertainty.
Removing the cognitive load and complexity with dlt+ AI
Here’s an example how you can build faster, with less complexity and no tech debtbt - use a LLM to accelerate your development, and let dlt self-maintain the pipelines.
Here’s the step by step:
1. Refer to the dlt docs for setting up your cursor.
2. Run dlt ai setup cursor
to auto-generate applicable project rules for REST API sources. It’s convenient for OpenAPI specs or legacy code.
You can review the rules in .cursor/rules
or via Cursor Settings → Rules.

Here’s a short video description of how we did it.
3. Next, we used the following prompt:
Hey!
I need a dlt REST API pipeline built using this OpenAPI spec:
👉 https://fast-api-jaffle-shop-jz2mh.ondigitalocean.app/openapi.json
The base URL is:
https://jaffle-shop.dlthub.com/api/v1
Can you include the following endpoints in the pipeline?
- customers, orders, products, supplies, stores
A few things to note:
- Check how pagination works for each endpoint. Use curl to hit the endpoints and see what kind of paginator they're using.
- If there's a way to load data incrementally (like using a created_at or updated_at filter), definitely take advantage of it.
- I only need data for the first 10 days of January 2017, so the pipeline should pull just that slice.
Let’s call the final Python file my_pipeline.py.
Ping me if anything’s unclear or if an endpoint behaves weirdly.
4. With a few iterations, Cursor produced code that worked as expected:
- Correct pagination logic based on the API.
- Data was loaded for the specified time frame.
- Incremental loading was correctly implemented on the orders endpoint
This is the code produced:
import dlt
from dlt.sources.rest_api import rest_api_source
# Define the base URL for the Jaffle Shop API
BASE_URL = "https://jaffle-shop.dlthub.com/api/v1"
# Configure the dlt REST API source
# No authentication required for this API
jaffle_shop_source = rest_api_source(
{
"client": {
"base_url": BASE_URL,
# Default pagination: Header Link Paginator
# This is used by all endpoints in this API
"paginator": {
"type": "header_link",
"links_next_key": "next", # Default value, matches the API's Link header
},
# Default page size parameter used by the paginator - MOVED TO resource_defaults
},
"resource_defaults": {
"primary_key": "id", # Assuming 'id' is the primary key for most resources
"write_disposition": "merge", # Merge on 'id'
"endpoint": {
# No data selector needed as data is the top-level array
"data_selector": "$", # Select the whole list
# page_param is handled by the header_link paginator internally
# Default request parameters for all resources
"params": {
"page_size": 100 # Default from OpenAPI spec
}
}
},
"resources": [
{
"name": "customers",
"endpoint": {
"path": "customers",
# Inherits default pagination and params
}
},
{
"name": "orders",
"endpoint": {
"path": "orders",
"params": {
# Apply the specific date range filter for this resource
"start_date": "2017-01-01",
"end_date": "2017-01-10",
# Keep default page_size - inherited from resource_defaults now
},
# Incremental loading configuration for orders based on 'ordered_at'
# Uses the 'start_date' parameter which we also use for the initial filter
"incremental": {
"cursor_path": "ordered_at",
# Set initial value to the start of the desired range
# Note: The API might ignore this if start_date is also present?
# Let's keep it aligned with the filter for clarity.
"initial_value": "2017-01-01T00:00:00Z", # ISO format often preferred
"start_param": "start_date", # API parameter for filtering
# end_param could be 'end_date' but we are setting a fixed window initially
},
}
},
{
"name": "products",
"primary_key": "sku", # Product identifier is 'sku' - MOVED HERE
"endpoint": {
"path": "products",
# primary_key was moved up
# Inherits default pagination and params
}
},
{
"name": "supplies",
"endpoint": {
"path": "supplies",
# Inherits default pagination and params
}
},
{
"name": "stores",
"endpoint": {
"path": "stores",
# While stores have 'opened_at', the API spec doesn't show filtering params
# So, no incremental loading configuration here.
# Inherits default pagination and params
}
},
]
}
)
if __name__ == "__main__":
# Configure the pipeline
pipeline = dlt.pipeline(
pipeline_name="jaffle_shop_rest_api",
destination="duckdb", # Example destination, can be changed
dataset_name="jaffle_data",
progress="log", # Add progress logging
dev_mode=True # Recommended for development
)
# Run the pipeline with the configured source
load_info = pipeline.run(jaffle_shop_source)
# Print the outcome
print(load_info)
Conclusion
I've built dozens of REST API integrations over the years. The pagination debugging. The schema evolution headaches. The rate limiting dance.
Every single time, I told myself "this is just how it is with APIs."
But it's not.
With dlt and Cursor, I finally escaped the integration grind. What used to take days now takes minutes. What used to require deep API knowledge now works with a simple prompt.
This isn't just about faster development, it's about getting back to the work that actually matters. Building insights, not battling boilerplate.
Want to reclaim your time?
- Start here: dlt + Cursor documentation
- Get support: Join our Slack community and connect with other developers who've made the switch
- Browse examples: 30+ verified connectors to and deployment guides to learn from
Stop grinding. Start building!