Version: 0.5.4

extract.extractors

MaterializedEmptyList Objects

class MaterializedEmptyList(List[Any])

[view_source]

A list variant that will materialize tables even if empty list was yielded

materialize_schema_item

def materialize_schema_item() -> MaterializedEmptyList

[view_source]

Yield this to materialize schema in the destination, even if there's no data.

with_file_import

def with_file_import(
        file_path: str,
        file_format: TLoaderFileFormat,
        items_count: int = 0,
        hints: Union[TResourceHints, TDataItem] = None) -> DataItemWithMeta

[view_source]

Marks file under file_path to be associated with current resource and imported into the load package as a file of type file_format.

You can provide optional hints that will be applied to the current resource. Note that you should avoid schema inference at runtime if possible and if that is not possible - to do that only once per extract process. Use make_hints in mark module to create hints. You can also pass Arrow table or Pandas data frame form which schema will be taken (but content discarded). Create TResourceHints with make_hints.

If number of records in file_path is known, pass it in items_count so dlt can generate correct extract metrics.

Note that dlt does not sniff schemas from data and will not guess right file format for you.

Extractor Objects

class Extractor()

[view_source]

write_items

def write_items(resource: DltResource, items: TDataItems, meta: Any) -> None

[view_source]

Write items to resource optionally computing table schemas and revalidating/filtering data

ObjectExtractor Objects

class ObjectExtractor(Extractor)

[view_source]

Extracts Python object data items into typed jsonl

ArrowExtractor Objects

class ArrowExtractor(Extractor)

[view_source]

Extracts arrow data items into parquet. Normalizes arrow items column names. Compares the arrow schema to actual dlt table schema to reorder the columns and to insert missing columns (without data). Adds _dlt_load_id column to the table if add_dlt_load_id is set to True in normalizer config.

We do things that normalizer should do here so we do not need to load and save parquet files again later.

Handles the following types:

pyarrow.Table
pyarrow.RecordBatch
pandas.DataFrame (is converted to arrow Table before processing)

extract.extractors

MaterializedEmptyList Objects

materialize_schema_item

with_file_import

Extractor Objects

write_items

ObjectExtractor Objects

ArrowExtractor Objects

DHelp

Ask a question

MaterializedEmptyList Objects​

materialize_schema_item​

with_file_import​

Extractor Objects​

write_items​

ObjectExtractor Objects​

ArrowExtractor Objects​

DHelp

Ask a question

MaterializedEmptyList Objects

materialize_schema_item

with_file_import

Extractor Objects

write_items

ObjectExtractor Objects

ArrowExtractor Objects