extract.extractors
MaterializedEmptyList Objects
class MaterializedEmptyList(List[Any])
A list variant that will materialize tables even if empty list was yielded
materialize_schema_item
def materialize_schema_item() -> MaterializedEmptyList
Yield this to materialize schema in the destination, even if there's no data.
with_file_import
def with_file_import(
file_path: str,
file_format: TLoaderFileFormat,
items_count: int = 0,
hints: Union[TResourceHints, TDataItem] = None) -> DataItemWithMeta
Marks file under file_path
to be associated with current resource and imported into the load package as a file of
type file_format
.
You can provide optional hints
that will be applied to the current resource. Note that you should avoid schema inference at
runtime if possible and if that is not possible - to do that only once per extract process. Use make_hints
in mark
module
to create hints. You can also pass Arrow table or Pandas data frame form which schema will be taken (but content discarded).
Create TResourceHints
with make_hints
.
If number of records in file_path
is known, pass it in items_count
so dlt
can generate correct extract metrics.
Note that dlt
does not sniff schemas from data and will not guess right file format for you.
Extractor Objects
class Extractor()
write_items
def write_items(resource: DltResource, items: TDataItems, meta: Any) -> None
Write items
to resource
optionally computing table schemas and revalidating/filtering data
ObjectExtractor Objects
class ObjectExtractor(Extractor)
Extracts Python object data items into typed jsonl
ArrowExtractor Objects
class ArrowExtractor(Extractor)
Extracts arrow data items into parquet. Normalizes arrow items column names.
Compares the arrow schema to actual dlt table schema to reorder the columns and to
insert missing columns (without data). Adds _dlt_load_id column to the table if
add_dlt_load_id
is set to True in normalizer config.
We do things that normalizer should do here so we do not need to load and save parquet files again later.
Handles the following types:
pyarrow.Table
pyarrow.RecordBatch
pandas.DataFrame
(is converted to arrowTable
before processing)