dltHub
Blog /

Emmanuel's production-ready Kafka framework: extending dlt the right way

  • Aman Gupta,
    Data Engineer

Kafka in production isn’t “hello world.” It’s oh no world.

Think: messy topics, shifting schemas, and the occasional midnight panic when offsets decide to cosplay as Schrödinger’s cat.

Enter Emmanuel Ogunwede. Instead of rage-quitting Kafka (tempting) or spinning up a Spark cluster the size of a minor moon, he built a slim framework on top of dlt that levels up the vanilla Kafka source into something you’d actually trust to run in production.

👉Check out the repo on GitHub.

What vanilla dlt gives you (and where it stops short)

dlt’s built-in Kafka source is great if you just need a simple pipeline up and running:

  • You point it at specific topics
  • It happily ingests UTF-8 text (JSON out of the box)
  • But no Schema Registry integration
  • Adding new topics after the first run requires manual handling

Perfect for getting started and honestly, way easier than most first tries at Kafka.

But in production, Kafka throws curveballs and that’s where Emmanuel’s framework comes in.

Emmanuel’s upgrades 🚀

Instead of reinventing the wheel, Emmanuel identified the specific gaps and filled them systematically:

  • Dynamic topic discovery via regex patterns (.*_events finds all event topics automatically)
  • Avro + Schema Registry support with proper deserialization and schema evolution
  • Clean CLI interface that feels like a real tool

The genius? He built on top of dlt, not around it.

See it in action đŸŽ„

Emmanuel even made a video walking through it (watch at 2× speed if you’re impatient).

Done. The framework handles topic discovery, schema fetching, offset management, and loading.

Why this matters

Most Kafka setups live on two extremes:

  • Too complex: Spark/Flink sized for Mars missions
  • Too hacky: cron + script duct-taped together

Emmanuel found the middle ground: micro‑batch ingestion that’s production‑ready and maintainable.

It builds on what dlt already does well, like schema changes, normalization, and datatype inference, etc. and extends it to handle the messy realities of Kafka:

  • Topics appearing and disappearing
  • Offset management across restarts
  • Multiple serialization formats

Build smarter, not bigger

Don't fight your tools, extend them thoughtfully.

Emmanuel didn’t rage‑quit dlt because it lacked Avro. He bridged the gaps and ended up with something that feels like a natural extension, not a replacement.

Try it now

📖 Read Emmanuel’s implementation and technical design doc.

⭐ Don’t forget to give the repo a star.

Emmanuel showed us that production-grade doesn't have to mean complex. Sometimes it just means being thoughtful about the details that matter.