extract.resource
with_table_name
def with_table_name(item: TDataItems, table_name: str) -> DataItemWithMeta
Marks item
to be dispatched to table table_name
when yielded from resource function.
with_hints
def with_hints(item: TDataItems,
hints: TResourceHints,
create_table_variant: bool = False) -> DataItemWithMeta
Marks item
to update the resource with specified hints
.
Will create a separate variant of hints for a table if name
is provided in hints
and create_table_variant
is set.
Create TResourceHints
with make_hints
.
Setting table_name
will dispatch the item
to a specified table, like with_table_name
DltResource Objects
class DltResource(Iterable[TDataItem], DltResourceHints)
Implements dlt resource. Contains a data pipe that wraps a generating item and table schema that can be adjusted
source_name
Name of the source that contains this instance of the source, set when added to DltResourcesDict
section
A config section name
SPEC
A SPEC that defines signature of callable(parametrized) resource/transformer
from_data
@classmethod
def from_data(cls,
data: Any,
name: str = None,
section: str = None,
hints: TResourceHints = None,
selected: bool = True,
data_from: Union["DltResource", Pipe] = None,
inject_config: bool = False) -> Self
Creates an instance of DltResource from compatible data
with a given name
and section
.
Internally (in the most common case) a new instance of Pipe with name
is created from data
and
optionally connected to an existing pipe from_data
to form a transformer (dependent resource).
If inject_config
is set to True and data is a callable, the callable is wrapped in incremental and config
injection wrappers.
name
@property
def name() -> str
Resource name inherited from the pipe
with_name
def with_name(new_name: str) -> TDltResourceImpl
Clones the resource with a new name. Such resource keeps separate state and loads data to new_name
table by default.
is_transformer
@property
def is_transformer() -> bool
Checks if the resource is a transformer that takes data from another resource
requires_args
@property
def requires_args() -> bool
Checks if resource has unbound arguments
incremental
@property
def incremental() -> Optional[IncrementalResourceWrapper]
Gets incremental transform if it is in the pipe
validator
@property
def validator() -> Optional[ValidateItem]
Gets validator transform if it is in the pipe
validator
@validator.setter
def validator(validator: Optional[ValidateItem]) -> None
Add/remove or replace the validator in pipe
max_table_nesting
@property
def max_table_nesting() -> Optional[int]
A schema hint for resource that sets the maximum depth of nested table above which the remaining nodes are loaded as structs or JSON.
pipe_data_from
def pipe_data_from(data_from: Union[TDltResourceImpl, Pipe]) -> None
Replaces the parent in the transformer resource pipe from which the data is piped.
add_pipe
def add_pipe(data: Any) -> None
Creates additional pipe for the resource from the specified data
select_tables
def select_tables(*table_names: Iterable[str]) -> TDltResourceImpl
For resources that dynamically dispatch data to several tables allows to select tables that will receive data, effectively filtering out other data items.
Both with_table_name
marker and data-based (function) table name hints are supported.
add_map
def add_map(item_map: ItemTransformFunc[TDataItem],
insert_at: int = None) -> TDltResourceImpl
Adds mapping function defined in item_map
to the resource pipe at position inserted_at
item_map
receives single data items, dlt
will enumerate any lists of data items automatically
Arguments:
item_map
ItemTransformFunc[TDataItem] - A function taking a single data item and optional meta argument. Returns transformed data item.insert_at
int, optional - At which step in pipe to insert the mapping. Defaults to None which inserts after last step
Returns:
"DltResource"
- returns self
add_yield_map
def add_yield_map(item_map: ItemTransformFunc[Iterator[TDataItem]],
insert_at: int = None) -> TDltResourceImpl
Adds generating function defined in item_map
to the resource pipe at position inserted_at
item_map
receives single data items, dlt
will enumerate any lists of data items automatically. It may yield 0 or more data items and be used to
ie. pivot an item into sequence of rows.
Arguments:
item_map
ItemTransformFunc[Iterator[TDataItem]] - A function taking a single data item and optional meta argument. Yields 0 or more data items.insert_at
int, optional - At which step in pipe to insert the generator. Defaults to None which inserts after last step
Returns:
"DltResource"
- returns self
add_filter
def add_filter(item_filter: ItemTransformFunc[bool],
insert_at: int = None) -> TDltResourceImpl
Adds filter defined in item_filter
to the resource pipe at position inserted_at
item_filter
receives single data items, dlt
will enumerate any lists of data items automatically
Arguments:
item_filter
ItemTransformFunc[bool] - A function taking a single data item and optional meta argument. Returns bool. If True, item is keptinsert_at
int, optional - At which step in pipe to insert the filter. Defaults to None which inserts after last step
Returns:
"DltResource"
- returns self
add_limit
def add_limit(max_items: Optional[int] = None,
max_time: Optional[float] = None) -> TDltResourceImpl
Adds a limit max_items
to the resource pipe.
This mutates the encapsulated generator to stop after max_items
items are yielded. This is useful for testing and debugging.
Notes:
- Transformers won't be limited. They should process all the data they receive fully to avoid inconsistencies in generated datasets.
- Each yielded item may contain several records.
add_limit
only limits the "number of yields", not the total number of records. - Async resources with a limit added may occasionally produce one item more than the limit on some runs. This behavior is not deterministic.
Arguments:
max_items
int - The maximum number of items to yield, set to None for no limitmax_time
float - The maximum number of seconds for this generator to run after it was opened, set to None for no limit
Returns:
"DltResource"
- returns self
parallelize
def parallelize() -> TDltResourceImpl
Wraps the resource to execute each item in a threadpool to allow multiple resources to extract in parallel.
The resource must be a generator or generator function or a transformer function.
set_incremental
def set_incremental(
new_incremental: Union[Incremental[Any], IncrementalResourceWrapper],
from_hints: bool = False
) -> Optional[Union[Incremental[Any], IncrementalResourceWrapper]]
Set/replace the incremental transform for the resource.
Arguments:
new_incremental
- The Incremental instance/hint to set or replacefrom_hints
- If the incremental is set from hints. Defaults to False.
bind
def bind(*args: Any, **kwargs: Any) -> TDltResourceImpl
Binds the parametrized resource to passed arguments. Modifies resource pipe in place. Does not evaluate generators or iterators.
args_bound
@property
def args_bound() -> bool
Returns true if resource the parameters are bound to values. Such resource cannot be further called. Note that resources are lazily evaluated and arguments are only formally checked. Configuration was not yet injected as well.
explicit_args
@property
def explicit_args() -> StrAny
Returns a dictionary of arguments used to parametrize the resource. Does not include defaults and injected args.
state
@property
def state() -> StrAny
Gets resource-scoped state from the active pipeline. PipelineStateNotAvailable is raised if pipeline context is not available
__call__
def __call__(*args: Any, **kwargs: Any) -> TDltResourceImpl
Binds the parametrized resources to passed arguments. Creates and returns a bound resource. Generators and iterators are not evaluated.
__or__
def __or__(transform: Union["DltResource", AnyFun]) -> "DltResource"
Allows to pipe data from across resources and transform functions with | operator This is the LEFT side OR so the self may be resource or transformer
__ror__
def __ror__(data: Union[Iterable[Any], Iterator[Any]]) -> TDltResourceImpl
Allows to pipe data from across resources and transform functions with | operator This is the RIGHT side OR so the self may not be a resource and the LEFT must be an object that does not implement | ie. a list
__iter__
def __iter__() -> Iterator[TDataItem]
Opens iterator that yields the data items from the resources in the same order as in Pipeline class.
A read-only state is provided, initialized from active pipeline state. The state is discarded after the iterator is closed.