dltHub
Blog /

We are releasing dlt+ Project & Cache in early access

  • Matthaus Krzykowski,
    Co-Founder & CEO

Dear community,

This week we are releasing the initial two features of dlt+, our developer framework for running dlt pipelines in production and at scale:

  • dlt+ Project: A declarative yaml collaboration point for your team. (what is it, docs)
  • dlt+ Cache: A database-like portable compute layer for developing, testing, and running transformations before loading. (what is it, docs)

Our journey from dlt to dlt+ in recent months

How did we get to this point? Back in September, we announced 1000 OSS dlt customers in production. We also announced that we will begin focusing more on our commercial offerings. In November, we went on a global roadshow to show early prototypes of what has evolved to become dlt+ and started working with early design partners.

Over the last few months we consistently heard from design partners about challenges to deploy and run dlt in production, about new data infrastructure architectures that include DuckDB and Iceberg, about engineering teams using AI code editors such as Cursor.

dlt+ is the commercial extension to the open source data load tool (dlt). If open source dlt is the developerfirst “glue” between a single source and a destination, then dlt+ is the developer-first “glue” that binds data platform components together, creating a cutting-edge custom data platform.

From today onwards, we welcome additional early testers of dlt+ whom we can collaborate with and who help shape dlt+ with us further.

We will be releasing more dlt+ features subsequently. We will be communicating our overall vision for dlt+ in the coming weeks.

The initial two dlt+ features

The dlt+ Project

dlt+ Project

We created dlt+ Project as:

  • A collaborative interface for your dlt sources, pipelines, destinations, and transformations & an easy onboarding and collaboration with non-Python developers.
  • An environment with dev/prod profiles that enable local development and offer a transparent process of going to production for stakeholders.
  • A single, authoritative manifest that unifies pipeline definition, deployment, and orchestration.

The broader need & concept:

The initial dlt pipeline is often built and brought into production by a single engineer or a small engineering team. But how should this engineer or engineering team work with the wider company?

The dlt Project yaml acts as a manifest, providing both a simple interface for managing your pipelines while also enabling packaging, deployment, and orchestration of your workflows with dlt+. The manifest file acts like a single source of truth for dlt work, keeping stakeholders aligned.

The dlt+ Cache

dltplus Cache

We created the dlt+ Cache to enable a few new use cases:

  • Local development for SQL code, enabling faster iteration for developers (many in the dlt community asked for this use case).
  • Test before loading to production, enabling safer, faster and cheaper data contracts and quality checks.
  • Move compute between vendors with the engine-agnostic layer that translates between SQL dialects.
  • Transform data in SQL (using dbt, sqlmesh) or Python (using Arrow, Pandas, or ibis expressions; optionally compiled to SQL) before loading to a final destination, be it a catalog, a database, or a data lake.

At the start of early access the primary use case is running transformations locally, but we work to support the other use cases in the coming weeks.

The broader need & concept:

The dlt+ Cache is a powerful tool that enables users to shift parts of their data workflows earlier in the development process. It’s aligned with the principles articulated as "Shift Yourself Left: Integration testing for Data Engineers" by Josh Wills in our November meetup in San Francisco.

Today, its initial use case is running transformations locally, but we plan to support more use cases and workflows in the future.

The cache is powered by DuckDB locally. Or you can bring your own engine, such as BigQuery or Snowflake. You can manipulate cached data and push it back to any dlt destination.

Together with the other components of the platform, the cache enables a complete testing and validation layer for data transformations, combining vendor-unlocked portable code with a local engine or BYO online engine, schema enforcement, debugging tools, and integration with existing data workflows.

Besides the apparent use cases, the dlt+ cache was created to serve as a compute engine that can be packaged in a docker-like manner with your code, creating portable data products. dlt+ projects can be packaged as pip installable datasets, which can then be consumed in notebooks or as API-free data mesh data products or microservices.

Secure your dlt+ early access

Interested in becoming an early tester? Join dlt+ early access. This is your opportunity to try and influence our commercial product.