extract.extract
data_to_sources
def data_to_sources(
data: Any,
pipeline: SupportsPipeline,
*,
schema: Schema = None,
table_name: str = None,
parent_table_name: str = None,
write_disposition: TWriteDispositionConfig = None,
columns: TAnySchemaColumns = None,
primary_key: TColumnNames = None,
table_format: TTableFormat = None,
schema_contract: TSchemaContract = None) -> List[DltSource]
Creates a list of sources for data items present in data
and applies specified hints to all resources.
data
may be a DltSource, DltResource, a list of those or any other data type accepted by pipeline.run
describe_extract_data
def describe_extract_data(data: Any) -> List[ExtractDataInfo]
Extract source and resource names from data passed to extract
Extract Objects
class Extract(WithStepInfo[ExtractMetrics, ExtractInfo])
original_data
Original data from which the extracted DltSource was created. Will be used to describe in extract info
__init__
def __init__(schema_storage: SchemaStorage,
normalize_storage_config: NormalizeStorageConfiguration,
collector: Collector = NULL_COLLECTOR,
original_data: Any = None) -> None
optionally saves originally extracted original_data
to generate extract info
commit_packages
def commit_packages() -> None
Commits all extracted packages to normalize storage