dltHub
Blog /

AI built the Pipeline, I plugged the leaks

  • Adrian Brudaru,
    Co-Founder & CDO

🧪 The Experiment

We wanted to test just how far AI could take us in building a data pipeline if we gave it a larger codebase as docs. Specifically, a Singer connector for Pipedrive. The idea was simple: give the model the repo folder and the public API docs, and see if it could write a new pipeline in the self-documenting and self-healing dltHub REST API Source.

No manual coding. No step-by-step prompting. Just: here’s everything, go build.

Video:

âś… What Worked

To its credit, the AI got a lot right.

It read the README, extracted the schema, listed all the endpoints, and even attempted pagination. The code it generated wasn’t perfect, but it was functional enough to run — after a few tweaks. And once it did, data started flowing.

That part? Genuinely impressive.

❌ Where It Broke

The paginator config failed. Specifically, the model didn’t know how to use or even import JSONResponseCursorPaginator. Our docs didn’t mention how to do that, and without it, the model had no clue.

It also included a goals resource - present in the Singer Tap, and I assumed deprecated and removed it - but instead, it had a different request pattern and it didn't work out of the box. What must have happened here is that the LLM understood the endpoint is included but it missed the specific implementation, and implemented it like all the others.

🛠️ How We Fixed It

This wasn’t about rewriting logic. It was about having the right context.

I manually provided the import path:
from dlt.sources.helpers.rest_client.paginators import JSONResponseCursorPaginator

No new code. Just config tuning, tracebacks, and knowing what didn’t belong.

For the goals resource, i just deleted it, and then everything else worked fine.

🧠 What This Really Was

This wasn’t “AI writes the pipeline.” It was “AI proposes a scaffold, and I play tech support.”

But the collaboration worked - not because the model was perfect, but because I knew how the parts were supposed to fit together. It was AI as a super-fast junior dev, not as an architect.

đź’ˇ Key Learnings

  • Docs are still critical. Not for people, for models. If your AI’s output is wrong, it probably read a stale or incomplete README.
  • You don’t need to write code to build pipelines. But you do need to understand the pipeline structure well enough to debug it.
  • This was closer to “AI-assisted debugging” than “AI-generated engineering.” Which is still a step forward.
  • Good documentation is machine-readable empathy. The model isn’t randomly guessing, It's doing the best it can with the info it read and guessing the most likely thing for the rest.
  • All said and done, pipeline was written in 9 minutes, so it's really good, just not perfect.

My call to action to you:

Are you still maintaining legacy pipelines? you could be minutes away from cutting edge and zero maintenance, give vibe-modernisation a try. With dlt, you get more declarative and self documenting code, scale, self healing, and remaining (destination) tech agnostic while always keeping up with new things like Iceberg - future proofing.