Version: devel

dlt.extract.extractors

MaterializedEmptyList Objects

class MaterializedEmptyList(List[Any])

View source on GitHub

A list variant that will materialize tables even if empty list was yielded

materialize_schema_item

def materialize_schema_item() -> MaterializedEmptyList

View source on GitHub

Yield this to materialize schema in the destination, even if there's no data.

with_file_import

def with_file_import(
        file_path: str,
        file_format: TLoaderFileFormat,
        items_count: int = 0,
        hints: Union[TResourceHints, TDataItem] = None) -> DataItemWithMeta

View source on GitHub

Marks file under file_path to be associated with current resource and imported into the load package as a file of type file_format.

You can provide optional hints that will be applied to the current resource. Note that you should avoid schema inference at runtime if possible and if that is not possible - to do that only once per extract process. Use make_hints in mark module to create hints. You can also pass Arrow table or Pandas data frame form which schema will be taken (but content discarded). Create TResourceHints with make_hints.

If number of records in file_path is known, pass it in items_count so dlt can generate correct extract metrics.

Note that dlt does not sniff schemas from data and will not guess right file format for you.

Extractor Objects

class Extractor()

View source on GitHub

write_items

def write_items(resource: DltResource, items: TDataItems, meta: Any) -> None

View source on GitHub

Write items to resource optionally computing table schemas and revalidating/filtering data

ObjectExtractor Objects

class ObjectExtractor(Extractor)

View source on GitHub

Extracts Python object data items into typed jsonl

ModelExtractor Objects

class ModelExtractor(Extractor)

View source on GitHub

Extracts text items and writes them row by row into a text file

ArrowExtractor Objects

class ArrowExtractor(Extractor)

View source on GitHub

Extracts arrow data items into parquet. Normalizes arrow items column names. Compares the arrow schema to actual dlt table schema to reorder the columns and to insert missing columns (without data). Adds _dlt_load_id column to the table if add_dlt_load_id is set to True in normalizer config.

We do things that normalizer should do here so we do not need to load and save parquet files again later.

Handles the following types:

pyarrow.Table
pyarrow.RecordBatch
pandas.DataFrame (is converted to arrow Table before processing)

dlt.extract.extractors

MaterializedEmptyList Objects

materialize_schema_item

with_file_import

Extractor Objects

write_items

ObjectExtractor Objects

ModelExtractor Objects

ArrowExtractor Objects

DHelp

Ask a question

MaterializedEmptyList Objects​

materialize_schema_item​

with_file_import​

Extractor Objects​

write_items​

ObjectExtractor Objects​

ModelExtractor Objects​

ArrowExtractor Objects​

DHelp

Ask a question

MaterializedEmptyList Objects

materialize_schema_item

with_file_import

Extractor Objects

write_items

ObjectExtractor Objects

ModelExtractor Objects

ArrowExtractor Objects