How-to connect to text data using Microsoft ODBC

Querona allows you to connect to data sources using multiple technologies like ODBC, ADO.Net or JDBC. In this how-to, we will configure and use the Microsoft ODBC Text driver to connect to CSV and TSV text files.

Warning

ODBC driver has a hard limit of 256 columns in a file. If your file has more than 256 columns, the driver will return first 256 columns and ignore the rest.

Prerequisites

This connector requires a driver to talk to the database. The required ODBC driver may be already installed on the machine hosting Querona. If the driver is not installed, the attempt to create the data source connection displays an error message. In such an event, please follow the installation instructions at ODBC.

For beginners, we recommend to read the quick-start article Connect to the data source(s).

Creating connection to the CSV files

Note

The ODBC Text driver supports both delimited and fixed-width text files. A text file consists of an optional header line and zero or more text lines. For details, refer to Microsoft Text File Format documentation.

Assuming that your source text files are stored in the c:\data\csv folder, create the data source connection using the following settings:

Property Value
Provider ODBC
Dialect Microsoft ODBC Text
Connection string Driver=Microsoft Access Text Driver (*.txt, *.csv); Dbq=[ServerName]; FIL=text; DriverId=27; Extensions=csv,tsv,txt;
Server name c:\data\csv
Username /
Password /

Alternatively, you can use the connection string that utilizes the existing ODBC DSN, manually created using x64 version of ODBC Data Sources 64 bit in Windows. The DSN-connection string would be DSN=name-of-your-odbc-connection.

Warning

When specifying a folder path do not end it with a backslash character (\). If you do use trailing slash, no files will show up when your try to create a virtual database. This is the behaviour of the ODBC driver, not Querona’s.

Connection string properties that can be used are:

Property Description
DBQ Designates targeted directory; for example, C:\data\csv.
DefaultDir Default directory (working directory).
Description An optional description of the data in the data source; for example, Hire date, salary history, and current review of all employees.
DSN A name that identifies the data source, such as Sales or Personnel.
Extensions Lists the file name extensions of the text files on the data source. To display files without extensions, *. must be added to the list.
HDR Indicates that the first row of the file holds the name of the column. Possible values: YES, NO. Defaults to YES. Example usage: HDR=NO;
MaxBufferSize Defaults to 2048.
MaxScanRows The number of rows to scan to determine the data type of each column. The data type is determined given the maximum number of kinds of data found. If data is encountered that does not match the data type guessed for the column, the data type will be returned as a NULL value. For the Text driver, you may enter a number from 1 to 32767 for the number of rows to scan; however, the value will always default to 25. (A number outside the limit will return an error.)
ReadOnly Designates the database as read-only. Possible values: YES, NO. Defaults to NO; for example, ReadOnly=YES;
FIL Category of the data source. For text data source it is “text”.
DriverId Id of the driver to use. For Text driver, it is 27.
UID Id of the user.
PWD The password corresponding to the user ID, or an empty string if there is no password for the user ID (PWD=;).
PageTimeout Defaults to 5.
SafeTransactions Defaults to 0.
Threads Defaults to 3.

When you’re ready, click the Test connection and Create to save your connection.

Note

If you need to define the format (schema) of a file, you need to create a DSN on the server and then use the “Define format” option in ODBC Manager GUI. It displays the Define Text Format dialog box and enables you to specify the schema for individual tables in the data source directory. Alternatively, you can manually create a schema.ini file and store it in the folder containing your text files. For details, please navigate to Microsoft Schema.ini file documentation and supported Text File Data Types.

You can now create a virtual database in a standard way, using the connection created. Next, perform the manual configuration of metadata described below.

Manual configuration of metadata

A manual modification of table metadata in Querona is required after you import metadata from ODBC.

Please follow these steps:

  1. Logon to Querona Admin Portal and navigate to the :term:’virtual database’ you created
  2. For each virtual table in the database perform the following actions:
    • In the Summary click Edit
    • In “Physical table name” field remove the schema name generated by the metadata importer; for example, change the [c:\datacsv].[my_file_name.csv] into [my_file_name.csv].
    • If the field “Virtual table name” contains a hypen character (-) or dot (.), please remove it; for example, change the my_column-name.csv into my_column_name. ODBC driver does not accept aliases that contain a hypen nor dot character.
    • Save your changes and verify that data is readable by clicking Preview data and Show data.