sources.helpers.transform
take_first
def take_first(max_items: int) -> ItemTransformFunctionNoMeta[bool]
A filter that takes only first max_items
from a resource
skip_first
def skip_first(max_items: int) -> ItemTransformFunctionNoMeta[bool]
A filter that skips first max_items
from a resource
pivot
def pivot(paths: Union[str, Sequence[str]] = "$",
prefix: str = "col") -> ItemTransformFunctionNoMeta[TDataItem]
Pivot the given sequence of sequences into a sequence of dicts, generating column names from the given prefix and indexes, e.g.: {"field": [[1, 2]]} -> {"field": [{"prefix_0": 1, "prefix_1": 2}]}
Arguments:
paths
Union[str, Sequence[str]] - JSON paths of the fields to pivot.prefix
Optional[str] - Prefix to add to the column names.
Returns:
ItemTransformFunctionNoMeta[TDataItem]
- The transformer function.
add_row_hash_to_table
def add_row_hash_to_table(row_hash_column_name: str) -> TDataItem
Computes content hash for each row of panda frame, arrow table or batch and adds it as row_hash_column_name
column.
Internally arrow tables and batches are converted to pandas DataFrame and then hash_pandas_object
is used to
generate a series with row hashes. Hashes are converted to signed int64 and added to original table. Data may be modified.
For SCD2 use with a resource configuration that assigns custom row version column to row_hash_column_name