Skip to main content
Version: devel

Troubleshooting

Need help deploying these sources, or figuring out how to run them in your data stack?
Join our Slack community or Book a call with a dltHub Solutions Engineer.

Troubleshooting connection

Connecting to MySQL with SSL

Here, we use the mysql and pymysql dialects to set up an SSL connection to a server, with all information taken from the SQLAlchemy docs.

  1. To enforce SSL on the client without a client certificate you may pass the following DSN:

    sources.sql_database.credentials="mysql+pymysql://root:<pass>@<host>:3306/mysql?ssl_ca="
  2. You can also pass the server's public certificate (potentially bundled with your pipeline) and disable host name checks:

    sources.sql_database.credentials="mysql+pymysql://root:<pass>@<host>:3306/mysql?ssl_ca=server-ca.pem&ssl_check_hostname=false"
  3. For servers requiring a client certificate, provide the client's private key (a secret value). In Airflow, this is usually saved as a variable and exported to a file before use. The server certificate is omitted in the example below:

    sources.sql_database.credentials="mysql+pymysql://root:<pass>@35.203.96.191:3306/mysql?ssl_ca=&ssl_cert=client-cert.pem&ssl_key=client-key.pem"

SQL Server connection options

To connect to an mssql server using Windows authentication, include trusted_connection=yes in the connection string.

sources.sql_database.credentials="mssql+pyodbc://loader.database.windows.net/dlt_data?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server"

To connect to a local sql server instance running without SSL pass encrypt=no parameter:

sources.sql_database.credentials="mssql+pyodbc://loader:loader@localhost/dlt_data?encrypt=no&driver=ODBC+Driver+17+for+SQL+Server"

To allow self signed SSL certificate when you are getting certificate verify failed:unable to get local issuer certificate:

sources.sql_database.credentials="mssql+pyodbc://loader:loader@localhost/dlt_data?TrustServerCertificate=yes&driver=ODBC+Driver+17+for+SQL+Server"

To use long strings (>8k) and avoid collation errors:

sources.sql_database.credentials="mssql+pyodbc://loader:loader@localhost/dlt_data?LongAsMax=yes&driver=ODBC+Driver+17+for+SQL+Server"

Troubleshooting backends

Notes on specific databases

Oracle

  1. When using the oracledb dialect in thin mode we are getting protocol errors. Use thick mode or cx_oracle (old) client.
  2. Mind that SQLAlchemy translates Oracle identifiers into lower case! Keep the default dlt naming convention (snake_case) when loading data. We'll support more naming conventions soon.
  3. Connectorx is for some reason slower for Oracle than the PyArrow backend.

See here for information and code on setting up and benchmarking on Oracle.

DB2

  1. Mind that SQLAlchemy translates DB2 identifiers into lower case! Keep the default dlt naming convention (snake_case) when loading data. We'll support more naming conventions soon.
  2. The DB2 type DOUBLE gets incorrectly mapped to the python type float (instead of the SqlAlchemy type Numeric with default precision). This requires dlt to perform additional casts. The cost of the cast, however, is minuscule compared to the cost of reading rows from database.

See here for information and code on setting up and benchmarking on db2.

MySQL

  1. The SqlAlchemy dialect converts doubles to decimals. (This can be disabled via the table adapter argument as shown in the code example here)

Postgres / MSSQL

No issues were found for these databases. Postgres is the only backend where we observed 2x speedup with ConnectorX (see here for the benchmarking code). On other db systems it performs the same as (or some times worse than) the PyArrow backend.

Notes on specific data types

JSON

In the SQLAlchemy backend JSON data type is represented as a Python object, and in the PyArrow backend, it is represented as a JSON string. At present it does not work correctly with pandas and ConnectorXwhich cast Python objects to str, generating invalid JSON strings that cannot be loaded into destination.

UUID

UUIDs are represented as string by default. You can switch this behavior by using table_adapter_callback to modify properties of the UUID type for a particular column. (See the code example here for how to modify the data type properties of a particular column.)

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.