dlt.extract.extractors
MaterializedEmptyList Objects
class MaterializedEmptyList(List[Any])
A list variant that will materialize tables even if empty list was yielded
materialize_schema_item
def materialize_schema_item() -> MaterializedEmptyList
Yield this to materialize schema in the destination, even if there's no data.
with_file_import
def with_file_import(
file_path: str,
file_format: TLoaderFileFormat,
items_count: int = 0,
hints: Union[TResourceHints, TDataItem] = None) -> DataItemWithMeta
Marks file under file_path to be associated with current resource and imported into the load package as a file of
type file_format.
You can provide optional hints that will be applied to the current resource. Note that you should avoid schema inference at
runtime if possible and if that is not possible - to do that only once per extract process. Use make_hints in mark module
to create hints. You can also pass Arrow table or Pandas data frame form which schema will be taken (but content discarded).
Create TResourceHints with make_hints.
If number of records in file_path is known, pass it in items_count so dlt can generate correct extract metrics.
Note that dlt does not sniff schemas from data and will not guess right file format for you.
Extractor Objects
class Extractor()
write_items
def write_items(resource: DltResource, items: TDataItems, meta: Any) -> None
Write items to resource optionally computing table schemas and revalidating/filtering data
ObjectExtractor Objects
class ObjectExtractor(Extractor)
Extracts Python object data items into typed jsonl
ModelExtractor Objects
class ModelExtractor(Extractor)
Extracts text items and writes them row by row into a text file
ArrowExtractor Objects
class ArrowExtractor(Extractor)
Extracts arrow data items into parquet. Normalizes arrow items column names.
Compares the arrow schema to actual dlt table schema to reorder the columns and to
insert missing columns (without data). Adds _dlt_load_id column to the table if
add_dlt_load_id is set to True in normalizer config.
We do things that normalizer should do here so we do not need to load and save parquet files again later.
Handles the following types:
pyarrow.Tablepyarrow.RecordBatchpandas.DataFrame(is converted to arrowTablebefore processing)