Skip to main content
Version: 1.22.2 (latest) View Markdown

Release highlights: 1.19

New: Arrow streaming with ConnectorX

You can now return results from sql_database queries as an Arrow stream when using the ConnectorX backend.

This allows large query results to be processed incrementally, avoiding loading the full result set into memory.

Example:

from dlt.sources.sql_database import sql_database

db = sql_database(
backend="connectorx",
backend_kwargs={
"return_type": "arrow_stream", # new in 1.19
},
)

By default, ConnectorX returns PyArrow tables. Arrow streaming must be explicitly enabled.

Read more →


Visual pipeline run history in the dashboard

The dashboard now includes a visual history of pipeline runs, making it easier to inspect run status, duration, and changes over time.

This provides a clearer overview of pipeline health and helps diagnose failures faster.

run_history

Read more ->


Faster Parquet ingestion into MSSQL, MySQL, and SQLite via ADBC

dlt can now ingest Parquet files into SQL databases (MSSQL, MySQL, and SQLite) using ADBC.

When an ADBC driver is available, Parquet loading is enabled automatically and becomes the preferred method. This delivers a 10×–100× speedup compared to INSERT-based loading and is more reliable than CSV fallbacks.

If needed, you can explicitly revert to INSERT loading:

pipeline.run(
data_iter,
loader_file_format="insert_values",
)

Read more →


Visualize schemas with Schema.to_mermaid()

You can now export any dlt schema as a Mermaid diagram for quick visualization in documentation, pull requests, or onboarding materials.

schema_mermaid = pipeline.default_schema.to_mermaid()

mermaid

Schemas can also be exported from the CLI and rendered natively in tools like GitHub Markdown and Notion.

Read more →


Snowflake clustering key improvements

Snowflake destinations now support updating clustering keys using column hints.

Clustering changes are applied when a table alteration is triggered (for example, when a new column is added), making it easier to tune clustering for large tables without recreating them.

Example:

@dlt.resource(table_name="events")
def events():
yield {"event_id": 1, "country": "DE"}

events.apply_hints(columns=[{"name": "event_id", "cluster": True}])
pipeline.run(events())

Read more →


Shout-out to new contributors

Big thanks to our newest contributors:


Full release notes

View the complete list of changes →

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.