dlt.common.libs.pyarrow
UnsupportedArrowTypeException Objects
class UnsupportedArrowTypeException(DltException)
Exception raised when Arrow type conversion failed.
The setters are used to update the exception with more context such as the relevant field and tablea it is caught downstream.
PyToArrowConversionException Objects
class PyToArrowConversionException(DltException)
Exception raised when converting data to Arrow based on a TableSchema
ArrowSchemaNormalizationResult Objects
class ArrowSchemaNormalizationResult(NamedTuple)
Named result of should_normalize_arrow_schema
Fields:
- should_normalize: whether any normalization is required
- rename_mapping: mapping from original field names to normalized names
- rev_mapping: reverse mapping from normalized names to original
- nullable_updates: fields that require nullable flag updates (by normalized name)
- columns: potentially filtered/adjusted TTableSchemaColumns
get_column_type_from_py_arrow
def get_column_type_from_py_arrow(dtype: pyarrow.DataType) -> TColumnType
Returns (data_type, precision, scale) tuple from pyarrow.DataType
py_arrow_to_table_schema_columns
def py_arrow_to_table_schema_columns(
schema: pyarrow.Schema) -> TTableSchemaColumns
Convert a PyArrow schema to a table schema columns dict.
Arguments:
schemapyarrow.Schema - pyarrow schema
Returns:
TTableSchemaColumns- table schema columns
get_nested_column_type_from_py_arrow
def get_nested_column_type_from_py_arrow(
dtype: pyarrow.DataType) -> TColumnType
Creates json dlt data type with nested type structure in x-nested-type hint.
Currently the only recognized nested type format is arrow-ipc
serialize_type
def serialize_type(dtype: pyarrow.DataType) -> str
Serializes arrow type via arrow ipc as base64 str
remove_null_columns
def remove_null_columns(item: TAnyArrowItem) -> TAnyArrowItem
Remove all columns of datatype pyarrow.null() from the table or record batch. Stores removed column names in arrow schema metadata under 'dlt.null_columns' key.
remove_null_columns_from_schema
def remove_null_columns_from_schema(
schema: pyarrow.Schema) -> Tuple[pyarrow.Schema, bool]
Remove all columns of datatype pyarrow.null() from the schema