Full loading
Full loading is the act of fully reloading the data of your tables. All existing data will be removed and replaced by whatever the source produced on this run. Resources that are not selected while performing a full load will not replace any data in the destination.
Performing a full load
To perform a full load on one or more of your resources, choose the write_disposition='replace'
for this resource:
p = dlt.pipeline(destination="bigquery", dataset_name="github")
issues = []
reactions = ["%2B1", "-1", "smile", "tada", "thinking_face", "heart", "rocket", "eyes"]
for reaction in reactions:
for page_no in range(1, 3):
page = requests.get(f"https://api.github.com/repos/{REPO_NAME}/issues?state=all&sort=reactions-{reaction}&per_page=100&page={page_no}", headers=headers)
print(f"Got page for {reaction} page {page_no}, requests left", page.headers["x-ratelimit-remaining"])
issues.extend(page.json())
p.run(issues, write_disposition="replace", primary_key="id", table_name="issues")
Choosing the correct replace strategy for your full load
dlt implements three different strategies for doing a full load on your table: truncate-and-insert
, insert-from-staging
, and staging-optimized
. The exact behavior of these strategies can also vary between the available destinations.
You can select a strategy with a setting in your config.toml
file. If you do not select a strategy, dlt will default to truncate-and-insert
.
[destination]
# Set the optimized replace strategy
replace_strategy = "staging-optimized"