Skip to main content
Version: devel

CSV file format

CSV is the most basic file format for storing tabular data, where all values are strings and are separated by a delimiter (typically a comma). dlt uses it for specific use cases - mostly for performance and compatibility reasons.

Internally, we use two implementations:

Supported destinations

How to configure

There are several ways of configuring dlt to use csv file format for normalization step and to store your data at the destination:
  1. You can set the loader_file_format argument to csv in the run command:

info = pipeline.run(some_source(), loader_file_format="csv")

  1. Alternatively, you can set the file type directly in the resource decorator.

@dlt.resource(file_format="csv") def generate_rows(nr):     pass

Default settings

dlt attempts to make both writers generate similarly looking files:

  • separators are commas
  • quotes are " and are escaped as ""
  • NULL values are both empty strings and empty tokens as in the example below
  • UNIX new lines are used
  • dates are represented as ISO 8601
  • quoting style is "when needed"

Example of NULLs:

text1,text2,text3
A,B,C
A,,""

In the last row, both text2 and text3 values are NULL. The Python csv writer is not able to write unquoted None values, so we had to settle for "".

Note: all destinations capable of writing CSVs must support it.

Change settings

You can change basic csv settings; this may be handy when working with the filesystem destination. Other destinations are tested with standard settings:

  • delimiter: change the delimiting character (default: ',')

  • include_header: include the header row (default: True)

  • quoting: controls when quotes should be generated around field values. Available options:

    • quote_needed (default): quote only values that need quoting, i.e., non-numeric values

      • Python CSV writer: All non-numeric values are quoted
      • PyArrow CSV writer: The exact behavior is not fully documented. We observed that in some cases, strings are not quoted as well
    • quote_all: all values are quoted

      • Supported by both Python CSV writer and PyArrow CSV writer
    • quote_minimal: quote only fields containing special characters (delimiter, quote character, or line terminator)

      • Supported by Python CSV writer only
    • quote_none: never quote fields

      • Python CSV writer: Uses escape character when delimiter appears in data
      • PyArrow CSV writer: Raises an error if data contains special characters
[normalize.data_writer]
delimiter="|"
include_header=false
quoting="quote_all"

Or using environment variables:

NORMALIZE__DATA_WRITER__DELIMITER=|
NORMALIZE__DATA_WRITER__INCLUDE_HEADER=False
NORMALIZE__DATA_WRITER__QUOTING=quote_all

Destination settings

A few additional settings are available when copying csv to destination tables:

  • on_error_continue - skip lines with errors (only Snowflake)
  • encoding - encoding of the csv file
tip

You'll need these settings when importing external files.

Limitations

arrow writer

  • binary columns are supported only if they contain valid UTF-8 characters
  • json (nested, struct) types are not supported

csv writer

  • binary columns are supported only if they contain valid UTF-8 characters (easy to add more encodings)
  • json columns dumped with json.dumps
  • None values are always quoted

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.