How-to connect to text data using Microsoft ODBC¶
Querona allows you to connect to data sources using multiple technologies like ODBC, ADO.Net or JDBC. In this how-to, we will configure and use the Microsoft ODBC Text driver to connect to CSV and TSV text files.
ODBC driver has a hard limit of 256 columns in a file. If your file has more than 256 columns, the driver will return first 256 columns and ignore the rest.
This connector requires a driver to talk to the database. The required ODBC driver may be already installed on the machine hosting Querona. If the driver is not installed, the attempt to create the data source connection displays an error message. In such an event, please follow the installation instructions at ODBC.
For beginners, we recommend to read the quick-start article Connect to the data source(s).
Creating connection to the CSV files¶
The ODBC Text driver supports both delimited and fixed-width text files. A text file consists of an optional header line and zero or more text lines. For details, refer to Microsoft Text File Format documentation.
Assuming that your source text files are stored in the c:\data\csv folder, create the data source connection using the following settings:
|Dialect||Microsoft ODBC Text|
|Connection string||Driver=Microsoft Access Text Driver (*.txt, *.csv); Dbq=[ServerName]; FIL=text; DriverId=27; Extensions=csv,tsv,txt;|
Alternatively, you can use the connection string that utilizes the existing ODBC DSN, manually created using x64 version of ODBC Data Sources 64 bit in Windows. The DSN-connection string would be DSN=name-of-your-odbc-connection.
When specifying a folder path do not end it with a backslash character (\). If you do use trailing slash, no files will show up when your try to create a virtual database. This is the behaviour of the ODBC driver, not Querona’s.
Connection string properties that can be used are:
|DBQ||Designates targeted directory; for example, C:\data\csv.|
|DefaultDir||Default directory (working directory).|
|Description||An optional description of the data in the data source; for example, Hire date, salary history, and current review of all employees.|
|DSN||A name that identifies the data source, such as Sales or Personnel.|
|Extensions||Lists the file name extensions of the text files on the data source. To display files without extensions, *. must be added to the list.|
|HDR||Indicates that the first row of the file holds the name of the column. Possible values: YES, NO. Defaults to YES. Example usage: HDR=NO;|
|MaxBufferSize||Defaults to 2048.|
|MaxScanRows||The number of rows to scan to determine the data type of each column. The data type is determined given the maximum number of kinds of data found. If data is encountered that does not match the data type guessed for the column, the data type will be returned as a NULL value. For the Text driver, you may enter a number from 1 to 32767 for the number of rows to scan; however, the value will always default to 25. (A number outside the limit will return an error.)|
|ReadOnly||Designates the database as read-only. Possible values: YES, NO. Defaults to NO; for example, ReadOnly=YES;|
|FIL||Category of the data source. For text data source it is “text”.|
|DriverId||Id of the driver to use. For Text driver, it is 27.|
|UID||Id of the user.|
|PWD||The password corresponding to the user ID, or an empty string if there is no password for the user ID (PWD=;).|
|PageTimeout||Defaults to 5.|
|SafeTransactions||Defaults to 0.|
|Threads||Defaults to 3.|
When you’re ready, click the Test connection and Create to save your connection.
If you need to define the format (schema) of a file, you need to create a DSN on the server and then use the “Define format” option in ODBC Manager GUI. It displays the Define Text Format dialog box and enables you to specify the schema for individual tables in the data source directory. Alternatively, you can manually create a schema.ini file and store it in the folder containing your text files. For details, please navigate to Microsoft Schema.ini file documentation and supported Text File Data Types.
You can now create a virtual database in a standard way, using the connection created. Next, perform the manual configuration of metadata described below.
Manual configuration of metadata¶
A manual modification of table metadata in Querona is required after you import metadata from ODBC.
Please follow these steps:
- Logon to Querona Admin Portal and navigate to the :term:’virtual database’ you created
- For each virtual table in the database perform the following actions:
- In the Summary click Edit
- In “Physical table name” field remove the schema name generated by the metadata importer; for example, change the [c:\datacsv].[my_file_name.csv] into [my_file_name.csv].
- If the field “Virtual table name” contains a hypen character (-) or dot (.), please remove it; for example, change the my_column-name.csv into my_column_name. ODBC driver does not accept aliases that contain a hypen nor dot character.
- Save your changes and verify that data is readable by clicking Preview data and Show data.