Instance configuration#

The are two models of connecting and working with Apache Spark:

  • using a local managed instance(s),

  • using an existing cluster (e.g. Azure HDInsight).

We can configure and run as many instances as needed, as long as they use different listening ports.

Note

To communicate with Spark, Querona uses a Driver, which is a Java application acting as a proxy, delegating incoming requests to and from Spark. Once started, the Driver listens for incoming connections from Querona.

Incoming requests are authenticated using an API-key. When required, the Driver opens the reverse connections to Querona via TDS, using the authentication method set on the Spark connection.

Local Instance Configuration#

Navigate to Administer ‣ Spark Instances.

Main menu

Select the default Spark instance, usually found under name like “Spark 3.x”.

Details menu

The following configuration settings can be set for a given instance:

Parameter

Description

Default value

Name

A friendly name of the instance e.g. Spark 3

Port

The port that the driver will open to communicate with Querona.

8400

Protocol

The protocol that the driver will use to exchange data with Querona. Possible values: Thrift, http

Thrift

Spark dialect

The dialect version used to communicate with the driver. Possible values: 2.0, 3.0.

Driver API key

The security key used to authenticate incoming requests - this must match the key given when configuring the connection to Spark. Using the default value is discouraged.

querona-key

Driver version

The version of the driver to be deployed to the instance - should match the used Spark version.

3.0.1

Delta Lake version

The version of Delta Lake libraries to use. Each Delta Lake version is compatible with one or more Spark versions. Setting to Auto instructs Querona to infer the latest compatible Delta Lake version from Spark binaries. Setting to None disables loading of Delta Lake libraries. For more information about Delta Lake and Spark compatibility see io.delta releases at.

3.0 (Spark 3.5)

SPARK_HOME

The root directory of Spark distribution used.

%PROGRAMDATA%\Querona\lib\spark\spark-3.5.1-bin-hadoop3

SPARK_CONF_DIR

The directory where Spark configuration is being stored

%PROGRAMDATA%\Querona\conf\spark

Log directory

The directory where Spark log files are being generated. No spaces allowed. See also: Disable Log Tracing

%PROGRAMDATA%\Querona\logs

Spark master

Spark Master URL

local[*]

Driver memory

Amount of memory to use for the driver process.

4g

Executor memory

Amount of memory to use per executor process

4g

Total executor cores

The number of cores to use on each executor

4

_JAVA_OPTIONS

Value of _JAVA_OPTIONS variable picked up by Spark. Note: setting too low like “-Xms1G” might trigger the following startup error: “‘Initial’ is not recognized as an internal or external command”

Autostart

Sets whether the given instance should be automatically started together with Querona.

False

Persistent

Sets whether the given instance should be left alive even the master Querona process is stopped.

False

Enable Log Tracing

Checking this flag will cause Querona to start tracing and forwarding Spark log entries. It’s useful when using a logging software such as SEQ, to have all logs in one place.

False

Advanced settings#

Querona stores a few configuration files in C:\ProgramData\Querona\conf\spark (by default).

File

Description

core-site.xml

Querona-specific HDFS configuration

hive-site.xml

Querona-specific Hive configuration

querona-site.xml

Querona configuration parameters

spark-env.cmd

Spark environment variables configuration

Adding a managed instance#

Navigate to Administer ‣ Spark Instances.

To add a new local Spark instance:

  1. Click the Add button

  2. Uniquely name your instance

  3. Set values of the required settings and Save

To access your new instance, you need to create a Connection configuration targeting it.