Connection configuration#

For general information regarding connections see Create a connection.

When adding a new connection to Spark, the following parameters apply:



Default value


A friendly name of the connection e.g. Local Spark 3.0


A comma-separated list of server:port pairs to connect to. For local instance use localhost:port.



The protocol that the driver will use to exchange data with Querona. Possible values: Thrift, http


Spark dialect

(= protocol version). The dialect version used to communicate with the driver. Possible values: 2.0, 3.0

Driver API key

The security key used to authenticate requests incoming from Querona. Using the default value is highly NOT recommended.



The account to be used for authenticating reverse connections from the driver


The password for the account above

Default database

The database used for storing data for cached views

Reverse connection host address

The address under which Querona can be reached by the driver. For local instance use localhost

Node connection timeout

The timeout used for waiting for each cluster node to respond.


Cluster initialization timeout

The timeout given for the instance to initialize when doing the initial request


Frame size

Frame size used when retrieving query results from Spark

16777216 bytes

Target partition size

Defines the target size of partitions.

128000000 bytes

Max partitions for auto repartitioning

Defines the upper limit for the number of partitions to be created.



When JOIN queries are pushed-down to execute on Spark, the data from the original database has to be copied to Spark. Querona registers external data sources as JDBC data frames, optionally applying repartition(n) function. If the data frame is too large it has to be broken into smaller chunks of size defined by Target partition size.

Scheduler pools#

To learn about Spark scheduler pools in general, please consult Spark Job Scheduling.

Scheduler pools describe relative part of cluster’s resources that will be assigned to process a given task. Querona’s Spark connections support the following scheduler pools:

  • default - scheduler pool used when no scheduler pool was selected explicitly

  • ETL - scheduler pool used by materialization jobs, and executes jobs in FIFO order

  • S, M, L, XL - 4 pools that user can specify for their queries with ascending allocation of resources

To specify pool to be used for given query use the OPTION clause. For example:




Exact configuration values for each scheduler pool can be reviewed and configured on Spark connection screen:

Spark scheduler pools

The pool properties are as follows:




FIFO means tasks are queued one after the other. FAIR means that tasks can run in parallel and share resources.


Decides how much of cluster’s resources will this pool get relative to other pools.

Min share

Share of cluster’s resources assigned by FAIR scheduler (if possible) before allocation of rest of resources takes place.

You can review pools used by spark for each query in Spark’s admin portal, just go to tab ‘Stages’:

Spark stages

Spark variables#

Any properties prefixed with spark can be defined in this section and will be used when creating a new Spark context. The notable parameters are:



Default value


The Spark UI port



The local working directory of Spark


Hive configuration#

Internally Spark uses part of the Hive functionality. The important property is the warehouse directory which is a file system folder used as physical data storage.

Depending on the version of Spark used, this parameter is set by either/or hive.metastore.warehouse.dir and spark.sql.warehouse.dir. These values are set in the config files but can be passed on the connection level as well.

We highly recommend to always consult the documentation of Spark to make sure the right values are set. Excerpt from the Apache Spark v2.2 documentation:


Note that the hive.metastore.warehouse.dir property in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of the database in a warehouse. You may need to grant write privilege to the user who starts the Spark application.

For the managed Spark instance, the following Hive parameters may need to be configured:



Default value


JDBC connection string to the Hive metastore

jdbc:sqlserver://localhost; databaseName=SparkMetastore;


JDBC login to Hive metastore


JDBC password to Hive metastore



Hive tmp folder

Connection test and Spark validation#

Once configured the Test Connection button can be used to verify whether the instance is responsive. Note, that this only test if the given driver instance responds to incoming connections. To fully test the configuration, the following steps are one of the methods to trigger and check Spark initialization:

  • Navigate to DATABASES and create a new database named “spark-test-db” linked to the newly created connection

Create Spark db
  • On the newly created database add new view of type Custom SQL named “test_view”:

Create Spark view
  • On the next screen type select 1 as col; and click Parse

Create Spark view
  • Finish creating the view

  • Click the view on the list and go to Caching and Partitioning, click Edit

  • Set Caching mode to Persistent caching - one table and Save

Create Spark view
  • Go to Management tasks and run Rebuild persistent cache task

Create Spark view
  • Go to query editor and run the following query on the Spark database:

select [col]
from [spark-test-db].[dbo].[test_view]
union all
select [col]
from [spark-test-db].[dbo].[test_view]
  • The result should look more or less like this:

Create Spark view

These actions should force Spark to initialized. To verify it change the URL in the browser from e.g. http://querona-example:9000 to http://querona-example:4040 where 4040 is the Spark GUI port set in spark.ui.port variable (4040 is the default value).

Spark GUI should open and you should see the query executed:

Local Spark

You can click on the tabs to see different jobs and parameters which might be useful for future debugging and optimization.