Erase tech debt from data loading with dlt + Cursor + LLMs
- Adrian Brudaru,
Co-Founder & CDO
From Legacy Scripts to Modern dlt Pipelines with Cursor
In this article, we try to write EL pipelines based on old code and explore how that can work.
I am using the same cursor setup as in the previous article, where I converted an airbyte YAML source into a modern self contained resilient python pipeline.
I am using an actual script I wrote 8 years ago.
And as a small easter egg: We have an example of an ancient data contract from back then. The process:
1. Load pipedrive data,
2. check if Pipedrive companies exist in Backend Companies.
3. If not, or if duplicate assigments, notify the account manager with the specific data point that needs correction.
Anyway, check out the 7min video, or read the process we did and outcomes:
🧩 Problem:
Eight years ago, I wrote a Python script to pull data from the Pipedrive API. It worked, but barely. No tests, no observability, no OOP, no code reuse. It was a one-off tool.
This kind of legacy code exists everywhere. It's not broken, but it's unmaintainable, undocumented, and unscalable.
When I had to hand over this code to the team i hired to take over, it was difficult because my code was complex. In fact, it was around that time that I started to think how useful a tool like dlt would be.
The specific challenges:
- Scripts were written without modern best practices.
- No modularity or reusability, each endpoint was hand-rolled.
- No structure around schema expectations (e.g. what counts as PII, what must never break).
- Not designed for production: Just let airflow handle some of that.
The real problem: You can’t layer good maintenance, governance or observability on top of a pile of trashy glue code.
🛠️ Solution Approach:
Instead of refactoring the old script, I tried to re-write it from scratch using dlt and Cursor, while preserving the original functionality and turning into:
- A declarative, self documenting and easily maintainable REST API source that even non coders can maintain
- with explicit schema management strategy via dlt
- and all the other best practices provided out of the box with dlt (retries, parallelism, etc)
🚀 Results:
- The resulting pipeline was cleaner, more modular, and fully compliant with dlt’s expectations.
- I can now swap destinations for example to Iceberg, run tests, track schema changes, and scale usage.
- We can enforce schema contracts or PII rules.
- It worked so well, you should really consider replacing your most error prone scripts right now.
🔑 Key Learnings:
- Old code doesn’t age well. But it can be an excellent spec for a clean rebuild.
- dlt makes REST ingestion truly modular, and easy to maintain with Cursor.
- Data contracts are essential, not optional. If you want trust at scale. it was important 8y ago, it's important now.
- Cursor accelerates modernization work, allowing you to test and document at the same time.
- You don’t need to boil the ocean. One legacy script at a time is enough.
Resources
- see previous post in the series for cursor setup: Converting Airbyte yaml to dlt REST API
Call to Action:
Have legacy ETL scripts lurking in your org? You can modernize them with dlt + Cursor in under a day. Stop maintaining legacy.