Glossary#

ADO.Net#

ADO.NET is a data access technology from the Microsoft .NET Framework that provides communication between relational and non-relational systems through a common set of components. ADO.NET is a set of computer software components that programmers can use to access data and data services from a database. It is a part of the base class library that is included with the Microsoft .NET Framework.

base table#

A virtual table that directly maps to an object in the data source. Base table cannot be materialized directly, because physical data is stored at the data source.

column-level security#

Column-Level Security (CLS) enables access control to database table columns based on the user’s execution context or their group membership.

columnar database#

A columnar database stores data by columns rather than by rows, which makes it more suitable for analytical query processing, and thus for a data warehouse.

data anonymization#

Data anonymization has been defined as a “process by which personal data is irreversibly altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party.” Data anonymization enables the transfer of information across a boundary, such as between two departments within an agency or between two agencies, while reducing the risk of unintended disclosure, and in certain environments in a manner that enables evaluation and analytics post-anonymization.

data connection#
data source connection#

Data source connection is an object that holds all the necessary information like server name or url, user credentials or tokens, that allow for authentication to the data source, metadata extraction and querying of data.

data consumer#
data consumers#

A person or system that consumes data from Querona by connecting, issuing SQL queries and receiving results.

data fabric#

Data fabric is an architecture that facilitates the end-to-end integration of various data pipelines and cloud environments through the use of intelligent and automated systems.

data governance#

A principled approach to managing data during its life cycle, from acquisition to use to disposal. It includes a set of rules, processes and technology to ensure data is secure, private, accurate, available, and usable.

data lake#

A centralized storage system that stores large volumes of raw, unstructured, and semi-structured data from various sources, designed for big data and real-time analytics.

data lakehouse#

A data lakehouse is a data architecture that blends a data lake and data warehouse together, merging the best aspects of data warehouses and data lakes into one data management solution. It combines the key benefits of data lakes (large repositories of raw data in its original form) and data warehouses (organized sets of structured data), and enables the use of low-cost storage to store large amounts of raw data while providing structure and data management functions.

data mart#

A smaller, specialized version of a data warehouse, focusing on a particular subject or department, serving specific needs or user groups.

data masking#

Data masking or data obfuscation is the process of hiding original data with modified content (characters or other data.) The main reason for applying masking to a data field is to protect data that is classified as personal identifiable data, personal sensitive data or commercially sensitive data, however the data must remain usable and look real and consistent.

data mesh#

A decentralized, domain-oriented approach to data platform architecture and organizational design, treating data as a product with individual teams responsible for their own domain-specific data products.

data pipeline#

A set of processes and tools used to ingest, process, and load data from one system to another, handling data integration, transformation, and processing tasks.

data provider#

Data provider is a library either built-in or custom, that provides necessary implementation that allows for low-level connectivity to a data source.

data pseudonymization#

Data pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms. A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing Pseudonymization can be one way to comply with the European Union’s new General Data Protection Regulation demands for secure data storage of personal information. Pseudonymized data can be restored to its original state with the addition of information which then allows individuals to be re-identified, while anonymized data can never be restored to its original state. The pseudonym allows tracking back of data to its origins, which distinguishes pseudonymization from data anonymization, where all person-related data that could allow backtracking has been purged.

data source#

Data source is any of the following types of sources for digitized data: a database, a file, a data stream, and others.

data virtualization#

Data virtualization uses a software abstraction layer to create a unified, integrated, fully usable view of data, without physically copying, transforming or loading the source data to a target system. The abstraction hides most of the technical aspects of how and where data resides, is stored and processed, allowing to access data irrespective to what interfaces and technologies are needed at the data sources. Data virtualization functionality enables organizations to create virtual data warehouses, data lakes and data marts without the cost and complexity of building and managing separate platforms for each. While data virtualization can be used alongside ETL, it can be an alternative to ETL and to other physical data integration methods, and is one of the technologies that enable data fabric architecture.

data warehouse#

A Data Warehouse (DW or DWH), is a centralized, consistent repository of structured and semi-structured data, collected from various sources within an organization, designed to support efficient data querying, reporting, analytics, data mining, artificial intelligence (AI) and machine learning.

dbms#
DBMS#

Database Management System.

DDL#

Data Definition Language. Those statements in SQL that define, as opposed to manipulate, data. For example, CREATE TABLE, CREATE INDEX, GRANT, and REVOKE.

DML#

Data Manipulation Language. Those statements in SQL that manipulate, as opposed to define, data. For example, INSERT, UPDATE, DELETE, and SELECT.

dynamic data masking#

Dynamic data masking (DDM) is real-time data masking of data. DDM changes the data stream so that the data consumer does not get access to the sensitive data, while no physical changes to the original data take place.

Iceberg#
Apache Iceberg#

The open table format for analytic datasets.

integration virtual database#

A virtual database that supports data caching using one of the supported data processing systems - usually a DBMS or cloud service.

JDBC#

Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database.

metadata#

A set of data that describes and gives information about other data.

There are three main types of metadata according to NISO definitions:

  • Descriptive metadata describes a resource for purposes such as discovery and identification. It can include elements such as title, abstract, author, and keywords.

  • Structural metadata indicates how compound objects are put together, for example, how pages are ordered to form chapters.

  • Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it.

ODBC#

Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS).

OLE DB#

OLE DB (Object Linking and Embedding, Database, sometimes written as OLEDB or OLE-DB), an API designed by Microsoft, allows accessing data from a variety of sources in a uniform manner. The API provides a set of interfaces implemented using the Component Object Model (COM); it is otherwise unrelated to OLE.

pass-through virtual database#

A virtual database that is uses the direct connection to a data source. All queries that utilize this VDB type are translated into the data access technology supported by the source, for example the SQL Dialect or API call, and execute on the data source without any caching.

row-level security#

Row-Level Security (RLS) enables you to use group membership or execution context to control access to rows in a database table. In Querona, RLS supports the filter predicates that silently filter the rows available to read operations.

Spark#
Apache Spark#

Apache Spark is a unified analytics engine for large-scale data processing.

SQL Client#
Any software that accesses databases using TDS protocol and TSQL as query language.

It can be a low-level software client like Ado.Net, JDBC, ODBC, OleDb, or a high-level client like SSMS, PowerBI, Tableau, that uses a low-level client underneath.

static data masking#

With Static Data Masking, the user configures how masking operates for each column selected inside the database. Static Data Masking will then replace data in the database copy with new, masked data generated according to that configuration. Original data cannot be unmasked from the masked copy.

TDS#

The Tabular Data Stream (TDS) Protocol is an application-level protocol used for the transfer of requests and responses between clients and database server systems. In such systems, the client will typically establish a long-lived connection with the server. TDS includes facilities for authentication and identification, channel encryption negotiation, issuing of SQL batches, stored procedure calls, returning data, and transaction manager requests. Returned data is self-describing and record-oriented. The data streams describe the names, types and optional descriptions of the rows being returned. For more information see MS-TDS.

TSQL#
T-SQL#

Transact-SQL (T-SQL) is Microsoft’s and Sybase’s proprietary extension to the SQL (Structured Query Language) used to interact with relational databases. T-SQL expands on the SQL standard to include procedural programming, local variables, various support functions for string processing, date processing, mathematics, etc. and changes to the DELETE and UPDATE statements.

All applications that communicate with an instance of SQL Server do so by sending Transact-SQL statements to the server, regardless of the user interface of the application.

In Querona, we refer to T-SQL to also indicate the specific SQL dialect that stands behind SQL Server and its emulation built-into Querona.

virtual database#
virtual databases#
VDB#

A virtual database is a type of database management system that serves as a container to transparently view and query several other databases through a uniform API that culls from multiple sources as if they were a single entity. These databases are connected via a computer network and then accessed as if they are from a single database. A virtual database’s goal is to be able to view and access data in a unified way without needing to copy and duplicate it in several databases or manually combine the results from many queries. Each of the combined databases in the system is completely self-sustaining and functional, and is able to function on its own without depending on other existing databases. When an application requests to access a virtual database, the system figures out which of the databases contain the data being requested by the user and passes on the request to that databases.

virtual table#

A virtual table is an object in metadata that holds information about the remote object: usually a table, view or a tabular result of a query to source system.