it takes an open source python library
product principles & vision
dltHub is inspired by the open-source movement that drove the machine learning revolution. Similar to how Python's open-source tools are driving the current machine learning revolution, with dlt we aim to extend the underlying open source principles to enterprise data and apply similar principles to it.
a library, not a platform #
dlt is a library. When you add a library to your code, it belongs to you. On the other hand, when you add code to the platform, the code belongs to the platform.
- We respect our users’ existing workflows and the tools they use. We want to be a part of your code - regardless of what you do - without owning it.
- We do not replace your data platform, deployments, or security models. We fit in and supercharge them with dlt’s extract and load capabilities.
- We want to be a part of other libraries - be it open source libraries or internal libraries.
- We want to be a part of the extract and load part of any data platform - open source or internal.
- We not only want to fit in with what our users do. We additionally want to write helpers to make your tools and workflows more effective with our library (eg dbt, airflow, streamlit).
- We work wherever Python works. No backends, no cloud, no containers, and no hidden dependencies.
no black boxes, instead customisable code #
Take for example our data sources.
Our data sources are code. They are designed to be inspected, customized, and hacked. We distribute data sources as plain code with dlt init, so hacking and customization are also available to our end users.
multiply - don’t add - to our productivity #
We aim to eliminate repetitive and mundane tasks by automating our own and our users' work. By doing so we pass on productivity gains to them and their companies. We want productivity gains associated with dlt to compound.
- We particularly automate the extract, normalize, and load process by employing schema inference, evolution, and namespaces. Our goal is to make our pipelines as much “fire and forget” and “zero maintenance” as possible.
- We identify patterns in how our users work and convert them into decorators.
- We want to surface as many decorators to our users as possible, so that they can simply declare a behavior instead of having to code it.
- We are normie software. We value our users' time and effort when learning. We design our library so that the tricks you learn here can be applied everywhere.
code generation is automation on steroids #
Our users use code generation daily to increase their productivity. dlt as a library needs to work with with their workflows. We fully embrace code generation, whether it be with state of the art LLMs or algorithmic code generation.
- We constantly look for user tasks that can be automated and experiment on how it can be done.
- When you use dlt init we create the pipeline code for your source and destination and generate the sample configuration and credentials files.
- When you use dlt deploy we analyze your pipeline script and tracers from previous runs to generate deployment files and credentials in various formats
- We answer your questions and generate code snippets with our GPT-4
dhelpassistant for which we extracted and cut our docs and code in meaningful pieces.
- We generate dlt pipelines from OpenAPI specification. You can get an advanced pipeline for free for any FastAPI service like you can get a regular Python client.
- We experiment with generating data sources that can parse any unstructured data. The next step is to generate Python data source code automatically
pythonic: declarative, intuitive, with no learning curve #
If you know Python you should be able to use dlt right away. we go to great lengths to reduce user’s effort at the cost of development effort. The library API is composed with well known Python primitives and is transparent in their use.
if you like it
engineers that build internal data tools and who love the dlt product principles create these things
build a data pipeline #
- In data pipeline built with dlt you get structured data almost instantly due to automatic schema inference and decorators for incremental and merge loads.
- A data pipeline ran with dlt is self maintaining and scalable: it will not break if there is new data or if your chunk of data is suddenly 100x larger.
- A data pipeline built with dlt is just the Extract part of ETL, so it’s less work than your custom glue code. Load the json as it comes, don’t curate unknowns.
build a data warehouse with dlt - the human way #
- Standardise loading: don’t structure or clean data manually - dlt’s powerful normaliser can handle timestamps, column names, and anything else needed to give you nice structured data.
- Schema evolution enables you to have all the data loaded for curation, and changes are notified so you can immediately adjust to new realities.
- Ready-built sources and recipes for common problems or orchestration can be found in our docs.
build a structured data lake - tech agnostic, automated #
- Structured data lakes such as parquet lakes are the most suitable format for using raw data. dlt supports building structured data lakes on major technologies you already use.
- By automating schema inference, evolution, normalization, and typing,
dltenables you to load your JSON data into a structured data lake, allowing for better data governance and less manual work.
- dlt makes your team happy by automating tedious tasks and offering solutions to common problems.