dltHub
Blog /

Deep dive: Our initial assistants and model context protocol (MCP) on the Continue Hub

  • Matthaus Krzykowski,
    Co-Founder & CEO

The data infrastructure for trustworthy data engineering AI has yet to be invented

Large language models (LLMs) have helped scale analysts and data scientists efforts to transform and analyze large volumes of data via “Text-to-SQL”. Yet, data engineers remain skeptical of the usefulness of LLMs in their work and reports of successful AI tooling are scarce.

A key difference is that data engineering work involves heterogeneous live production systems (APIs, databases, compute clusters, queues) rather than mostly static datasets. Many challenges remain, amongst them:

  1. We haven’t figured out how to pass this information as text to LLMs
  2. We need a standardized approach to avoid implementing backends one by one
  3. To be trusted we need LLMs to understand complex enterprise data metrics, often defined by 1000s of rules

In last week’s post, we detailed these challenges and our broader vision towards an initial data infrastructure that generates trusted data.

We released the initial building blocks and assistants to work towards this vision with the community

dlt & dlt+ assistants and building blocks

This week, we’re sharing technical details about our initial dlt and dlt+ assistants as well as our Anthropic MCP server.

We released both last week on the Continue Hub. You can access all the assistants and building blocks from there.

We aim to work with dlt users as well as the wider community on steps towards this vision.

This post is aimed at people:

  • building with dlt and dlt+ and considering assistants
  • interested in creating their own custom assistants and reuse our building blocks, such as the dlt and dlt+ MCP servers, in their tooling or platform.

The open-source dlt Assistant helps you develop, debug, and inspect your dlt data pipelines

The demo above shows how to:

  • Inspect & debug data pipeline execution
  • Retrieve schema metadata and dataset records
  • Quickly understand load errors, timings, and file sizes
  • Automate repetitive debugging workflows
  • Create a new pipeline from scratch

We built this assistant as a first step as we wanted to provide dlt knowledge and tools to the IDE. This should help developers learn dlt and build more efficiently while remaining in control. Gradually, useful patterns will emerge and can refined into more autonomous agentic workflows.

Therefore the dlt Assistant is built on a core dlt construct: the Pipeline. A pipeline connects a data Source, which can organize many Resource, to a Destination.

dlt Assistant workflow

After executing your pipeline, the dlt Assistant can use tools to retrieve the following:

  • Load information: load timings, errors, file sizes, and all the information associated with the pipeline execution. This helps understand when pipelines were executed, compare runs, and inspect for anomalies.
  • Metadata: the dlt schema is a lightweight & easy to distribute document about the data semantics (types, nullability, relationships, etc.). This separation avoids having LLMs query the destination repeatedly.
  • Datasets: the dataset interface provides a consistent and simple way for LLMs to retrieve records from any destination. This enables simple text-to-SQL and serves as a foundation for advanced solutions.

With access to both the pipeline code and the various data it produces, we’re developing iterative workflows to debug and test dlt pipelines.

The dlt+ Assistant allows everyone to interact with dlt+ Project, destination and pipeline

This demo shows how to, using the dlt+ Project YAML interface, you can

  • set up a dlt+ Project,
  • add or modify up a dlt+ source
  • add or modify up a dlt+ destination
  • add or modify a dlt+ pipeline
  • run and preview a dlt+ pipeline

In a dlt+ Project your whole team can work on dlt sources, destinations and pipelines in a standardized way via a YAML Project manifest. This allows to quickly bootstrap projects, or enable self-serve datasets for data scientists for instance. The dlt+ Assistant is a companion product for teams working on dlt+ Project. It allows non-technical stakeholders to interact with a dlt+ Project.

dlt+ workflow

Conceptually, the dlt+ Assistant is an extension of the basic assistant which leverages dlt+ features. In its final form it is designed to provide a more encompassing understanding of your project or data platform.

For future versions of the dlt+ Assistant we are working on the following features:

  • Platform-wide data catalog for trusted data: Adding Iceberg support and data catalogs provide data quality as customers run 100s of dlt pipelines in LLM workflows
  • Augmented data transformation : In future users will be able to interact with the dlt+ Cache to create a local dev data environment and freely use LLMs and agents to create and edit data transformations. This is great for pipeline developers and can serve as a basis for text-to-SQL engines.

How to start using our assistants

  1. Install the Continue extension (VSCode or Jetbrains)
  2. Sign up on Continue
  3. Add your OPENAI_API_KEY or ANTHROPIC_API_KEY to the Secrets section
  4. Select the dlt Assistant and click “Open in VSCode”
  5. Now, the dlt Assistant should be available in your IDE!

On Continue Hub, you can customize your assistant by adding, removing, or editing blocks. You can also share configurations across your team. When using the assistant directly from dlthub, you will automatically receive updates we publish!

Integrating dlt in your favorite AI-enabled tool

The dlt Assistant and dlt+ Assistant as part of the Continue Hub are possible because of several components working together:

  • User: it’s you, the person writing code, chatting, and interacting with the assistant
  • Client / IDE: the tool with the interface to access the assistant, in this case Continue
  • LLM: the machine that takes text as input, reasons, and outputs text
  • MCP Server: a service able to execute Python code to interact with dlt and your data

Next is a sequence diagram showing how a user request is handled by the assistant

MCP

The key takeaway is that the Client / IDE occupies the central role; the User, LLM, and MCP Server only ever interact through the Client / IDE. This also means that you can reuse the dlt MCP and the dlt+ MCP with any Client supporting the model context protocol and pair it with your favorite LLM.

If you’re building a platform or developer tooling on top of dlt, you can directly integrate with the dlt MCP too!

Next steps

  • Let us know what you think about the assistants, building blocks and our vision!
  • We are particularly curious to talk to companies with data engineering teams that embrace AI code editors such as Continue, Cursor or Windsurf and learn about their needs and workflows.
  • Your feedback is appreciated as we work towards doing our part in inventing a data infrastructure for trustworthy data engineering AI