Adjust a schema
When you create and then run a pipeline, you may want
to manually inspect and change the schema that dlt
generated for
you. Here's how you do it.
1. Export your schemas on each run
Set up an export folder by providing the export_schema_path
argument to dlt.pipeline
to save the
schema. Set up an import folder from which dlt
will read your modifications by providing
the import_schema_path
argument.
Following our example in run a pipeline:
dlt.pipeline(
import_schema_path="schemas/import",
export_schema_path="schemas/export",
pipeline_name="chess_pipeline",
destination='duckdb',
dataset_name="games_data"
)
The following folder structure in the project root folder will be created:
schemas
|---import/
|---export/
Rather than providing the paths in the dlt.pipeline
function, you can also set them at
the beginning of the config.toml
file:
export_schema_path="schemas/export"
import_schema_path="schemas/import"
2. Run the pipeline to see the schemas
To see the schemas, you must run your pipeline again. The schemas
and import
/export
directories will be created. In each directory, you'll see a YAML file (e.g., chess.schema.yaml
).
Look at the export schema (in the export folder): this is the schema that got inferred from the data
and was used to load it into the destination (e.g., duckdb
).
3. Make changes in import schema
Now look at the import schema (in the import folder): it contains only the tables, columns, and
hints that were explicitly declared in the chess
source. You'll use this schema to make
modifications, typically by pasting relevant snippets from your export schema and modifying them.
You should keep the import schema as simple as possible and let dlt
do the rest.