Release highlights: 1.12.1
This update includes substantial improvements to developer experience and stability, as well as bug fixes. Here's a breakdown of what's new:
Placeholder config & secret warning system
PR: #2636
Overview
- dlt now detects placeholder values during config and secret resolution (e.g.,
"<configure me>"
, or timestamps like1768-07-21T02:56:00Z
). - A detailed warning is emitted when placeholders are detected, helping users identify incomplete configuration.
- Prevents confusing errors during pipeline runs and catches misconfigured values early in the development cycle.
Example placeholders
[credentials]
token = "<configure me>"
started_at = "1768-07-21T02:56:73Z" # random date
Example warning
Placeholder value encountered when resolving config or secret:
resolved_key: host, value: <configure me>, section: sources.sql_database.credentials.host
Most likely, this comes from init-command, which creates basic templates for non-complex configs and secrets. The provider to adjust is secrets.toml at one of these locations:
- /home/me/project/.dlt/secrets.toml
- /home/me/.dlt/secrets.toml
Warning for None
-only columns
PR: #2633
Overview
- When normalizing data, if a column contains only
None
, dlt can’t infer its type. - This PR adds a clear warning message to help users fix this by providing an explicit type hint.
- If no type hint is given, the column will be excluded from the destination table.
Example warning
The column `my_column` in table `my_table` did not receive any data during this load. Therefore, its type couldn't be inferred.
Unless a type hint is provided, the column will not be materialized in the destination.
Recommended fix
@dlt.resource(columns={"my_column": {"data_type": "text"}})
def my_data():
yield {"my_column": None}
Regular and standalone resources are now the same thing
Before
- Inner resources could not receive secrets or configs properly.
- Required the
standalone=True
argument.
@dlt.resource(standalone=True)
def foo(access_token=dlt.secrets.value):
yield from generate_json_like_data(access_token)
[sources.foo]
access_token = "your_token"
Now
- All
@dlt.resource
functions behave identically. standalone=True
is ignored and no longer required.- Inner resources adopt the config section from the source they are in:
sources.source_module.resource_name
this is canonical form.
Cleaner usage
@dlt.resource
def foo(access_token=dlt.secrets.value):
yield from generate_json_like_data(access_token)
For inner resources:
@dlt.source
def bar():
@dlt.resource
def foo(access_token=dlt.secrets.value):
yield from generate_json_like_data(access_token)
return foo
[sources.bar.foo]
access_token = "your_token"
Resources can return
data (instead of yield
)
Overview
- You can now return values instead of yielding.
- Useful for static data or short lists.
Example
@dlt.resource
def numbers():
return [{"key1": 1, "key2": 2}]
return
is automatically wrapped into a generator internally.
warning
Not recommended for large-scale use due to readability and stream control.
Performance and stability fixes
PR: #2756
1. Snowflake: extremely slow merge query due to OR
condition (n² complexity)
- In merge write disposition with
delete-insert
strategy, the generated SQL used:
DELETE FROM main_table WHERE EXISTS (
SELECT 1 FROM staging_table
WHERE main.pk1 = staging.pk1 AND main.pk2 = staging.pk2
OR main.merge_key = staging.merge_key
);
- This caused very slow performance due to
OR
logic inEXISTS
, especially on large datasets.
New approach
- Rewrites the logic to two separate
DELETE
queries, executed in the same transaction:
DELETE ... WHERE EXISTS (JOIN ON primary_key);
DELETE ... WHERE EXISTS (JOIN ON merge_key);
info
⚡ Improves performance from 18 hours → 4 seconds on large datasets.
2. Skip deduplication option
- New flag allows users to disable deduplication during
delete-insert
operations, further reducing merge cost. - Useful for append-heavy workloads with upstream deduplication.
- Works for any destination that supports
delete-insert
.
@dlt.resource(
primary_key="id",
write_disposition={
"disposition": "merge",
"strategy": "delete-insert",
"deduplicated": False
}
)
def data_resource():
yield from get_data()
Iceberg: upsert
merge strategy added
PR: #2671
Overview
- Iceberg now supports
merge
with upsert semantics, similar to Delta Lake. - Enables row-level updates using primary and merge keys.
warning
Known limitations due to current pyiceberg behavior:
- Nested fields and struct joins are not fully supported in Arrow joins (required by upsert).
- Non-unique keys in input data will raise hard errors — Iceberg enforces strict uniqueness.
- Some failing tests stem from current pyiceberg limitations (e.g., recursion limits, Arrow type mismatches).
Bonus: Postgres now supports parquet
via ADBC
PR: #2685
pipeline = dlt.pipeline(destination="postgres")
pipeline.run(some_source(), loader_file_format="parquet")