it takes an open source python library

product principles & vision

dltHub is inspired by the open-source movement that drove the machine learning revolution. Similar to how Python's open-source tools are driving the current machine learning revolution, with dlt we aim to extend the underlying open source principles to enterprise data and apply similar principles to it.

library

a library, not a platform #

dlt is a library. When you add a library to your code, it belongs to you. On the other hand, when you add code to the platform, the code belongs to the platform.

blackboxes

no black boxes, instead customisable code #

You can always peek inside, customize, and hack our code.

Take for example our data sources.

Our data sources are code. They are designed to be inspected, customized, and hacked. We distribute data sources as plain code with dlt init, so hacking and customization are also available to our end users.

multiply

multiply - don’t add - to our productivity #

We aim to eliminate repetitive and mundane tasks by automating our own and our users' work. By doing so we pass on productivity gains to them and their companies. We want productivity gains associated with dlt to compound.

  • We particularly automate the extract, normalize, and load process by employing schema inference, evolution, and namespaces. Our goal is to make our pipelines as much “fire and forget” and “zero maintenance” as possible.
  • We identify patterns in how our users work and convert them into decorators.
  • We want to surface as many decorators to our users as possible, so that they can simply declare a behavior instead of having to code it.
  • We are normie software. We value our users' time and effort when learning. We design our library so that the tricks you learn here can be applied everywhere.
code generation

code generation is automation on steroids #

Our users use code generation daily to increase their productivity. dlt as a library needs to work with with their workflows. We fully embrace code generation, whether it be with state of the art LLMs or algorithmic code generation.

  • We constantly look for user tasks that can be automated and experiment on how it can be done.
  • When you use dlt init we create the pipeline code for your source and destination and generate the sample configuration and credentials files.
  • When you use dlt deploy we analyze your pipeline script and tracers from previous runs to generate deployment files and credentials in various formats
  • We answer your questions and generate code snippets with our GPT-4 dhelp assistant for which we extracted and cut our docs and code in meaningful pieces.
  • We generate dlt pipelines from OpenAPI specification. You can get an advanced pipeline for free for any FastAPI service like you can get a regular Python client.
  • We experiment with generating data sources that can parse any unstructured data. The next step is to generate Python data source code automatically
pythonic

pythonic: declarative, intuitive, with no learning curve #

If you know Python you should be able to use dlt right away. we go to great lengths to reduce user’s effort at the cost of development effort. The library API is composed with well known Python primitives and is transparent in their use.

  • All our sources and resources, even the parallel or async are just Python iterators and will fit everywhere any iterator fits
  • You can do 90% of work by decorating regular functions
  • We go to great lengths to reduce pain with credentials

if you like it

engineers that build internal data tools and who love the dlt product principles create these things

data pipeline

build a data pipeline #

  • In data pipeline built with dlt you get structured data almost instantly due to automatic schema inference and decorators for incremental and merge loads.
  • A data pipeline ran with dlt is self maintaining and scalable: it will not break if there is new data or if your chunk of data is suddenly 100x larger.
  • A data pipeline built with dlt is just the Extract part of ETL, so it’s less work than your custom glue code. Load the json as it comes, don’t curate unknowns.
warehouse

build a data warehouse with dlt - the human way #

datalake

build a structured data lake - tech agnostic, automated #

  • Structured data lakes such as parquet lakes are the most suitable format for using raw data. dlt supports building structured data lakes on major technologies you already use.
  • By automating schema inference, evolution, normalization, and typing, dlt enables you to load your JSON data into a structured data lake, allowing for better data governance and less manual work.
  • dlt makes your team happy by automating tedious tasks and offering solutions to common problems.