How to migrate Spark’s default database with metabase on Derby#

Default Apache Spark database storage location configured by Querona is in %ProgramData%\Querona\warehouse folder. It is okay for development and test environments, but for production, we recommend to change the default location to a different one.

This guide’s objective is to describe actions required to migrate Spark default database to a different location, when using Derby as Spark metabase.

Introduction and Prerequisites#

A built-in Spark keeps its metadata in one of the supported engines:

  • Microsoft SQL Server in a database called SparkMetastore on a designated MS SQL Server instance

  • Apache Derby in a folder c:\ProgramData\Querona\metastore_db on a Querona server

In this article we focus on Derby. If you use MS SQL Server as metabase, the procedure is similar, however, you need to adjust the SQL commands to match the TSql dialect.

To migrate Spark’s database with metadata hosted on Derby, we need to update three tables in Spark metastore database: DBS and SDS, as they keep the locations for database and tables respectively.

User or users performing the migration steps need to collectively have:

  • administrative access to Querona,

  • administrative access to Spark metadata database,

  • administrative access to the folder holding the actual Spark database files,

  • administrative access to the destination folder.

Basic knowledge of SQL language is required.

Migration procedure#

Step 1, shutdown the Spark instance#

Connect to Querona’s administrative portal with administrative privileges and shutdown Spark instance: ADMINISTER ‣ Local Spark instances, select you local Spark connection and click STOP.

Warning

It is vital to stop Spark instance before attempting to do any metadata operations.

Step 2, prepare a backup of Spark metabase#

Before updating any data prepare a backup of Spark metabase.

Step 3, create and configure a destination folder for database files#

Create a folder that will hold the database files of the migrated database using tools of your operating system. In this article we assume that our destination folder is d:\warehouse.

Note

It is recommended to leave the source files intact, and delete them at the end of the procedure, only when you verify that the migrated database is readable.

The newly created folder does not have proper security Access Control Lists (ACL’s) set on it, that are required by Spark to run. To set the necessary ACL’s go to %ProgramData%\Querona\lib\tools\. Make a copy of the file FixHiveACL.bat, name it FixHiveACL-new.bat and edit the copy.

Change the lines inside the FixHiveACL-new.bat:

..\hadoop\hadoop-2.6.0\bin\winutils.exe chmod 777 ../../tmp/hive
..\hadoop\hadoop-2.6.0\bin\winutils.exe chmod 777 ../../warehouse

using our example destination path, into:

..\hadoop\hadoop-2.6.0\bin\winutils.exe chmod 777 d:/warehouse/tmp/hive
..\hadoop\hadoop-2.6.0\bin\winutils.exe chmod 777 d:/warehouse

and save the file.

Open an elevated command prompt, change the current directory to %ProgramData%\Querona\lib\tools\ and execute the FixHiveACL-new.bat. The script should execute without errors. If “access denied” error occurs, most probably you did not open command line with administrative privileges.

Step 4, copy files to the destination folder#

Copy the contents of %ProgramData%\Querona\warehouse into the destination folder (e.g.: d:\warehouse). While the copying process executes, you can proceed to Step 5.

Step 5, connect to Derby database holding Spark metastore#

  1. Download file Apache Derby 10.14.2.0 and uncompress it to the folder on Querona server

  2. Ensure that a recent backup of metastore_db database is available

  3. Login to the Querona server using Remote Desktop

  4. Start Windows command line and change your current directory to the folder holding Apache Derby files obtained in (1)

  5. Change directory to bin

  6. Type the ij command and press Enter

  7. In ij console run the following commands:

connect 'jdbc:derby:c:\ProgramData\Querona\metastore_db';
autocommit OFF;

Note

If ij returns and error stating that metabase was created with a newer version, download the reported ij version from Apache Derby <https://archive.apache.org/dist/db/derby> and start again using the newer version if ij.

Step 6, modify file locations in metabase#

Using the ij console run the following SQL commands.

List contents of DBS table and verify the current location of the database:

SELECT * FROM APP.DBS;

Results of the query should look like shown below:

DB_ID

DESC

DB_LOCATION_URI

NAME

OWNER_NAME

OWNER_TYPE

1

Default Hive database

file:/c:/ProgramData/Querona/warehouse

default

public

ROLE

Edit the following script and change the string 'file://drive:/your/new/path' into the destination folder path (do not add the trailing slash ‘/’):

-- DBS

update APP.DBS
   set DB_LOCATION_URI = 'file://drive:/your/new/path'
 where DB_LOCATION_URI = 'file:/c:/ProgramData/Querona/warehouse'
;

The execution result should be similar to the one below:

1 row affected

Edit the following script and change the string 'file://drive:/your/new/path/' to the destination folder path (add the trailing slash ‘/’):

-- SDS

update APP.SDS
   set LOCATION = 'file://drive:/your/new/path/' ||  substr(location, length('file:/c:/ProgramData/Querona/warehouse' + 1, length(location))
 where LOCATION like 'file:/c:/ProgramData/Querona/warehouse' || '%';

The execution result should be similar to the one below, depending on how many tables you have:

4632 rows affected

Verify that all entries were updated as expected and issue the following statement to save your changes:

commit;

Close the ij console by issuing the exit command:

exit;

Step 7, start Spark and verify that data is readable#

Ensure that the copying process from Step 4 completed successfully.

Use Querona Administrative Portal to start Spark instance: ADMINISTER ‣ Local Spark instances, select you local Spark connection and click START. Verify that data from migrated database is available, for example, by executing queries in Querona that affect data from Spark.

Step 8, plan to delete the source folder if not needed#

After the successful verification of migration, source files may be deleted, if not needed.

That concludes the Spark database migration procedure.

See Also#