Glossary#

ADO.Net#

ADO.NET is a data access technology from the Microsoft .NET Framework that provides communication between relational and non-relational systems through a common set of components. ADO.NET is a set of computer software components that programmers can use to access data and data services from a database. It is a part of the base class library that is included with the Microsoft .NET Framework.

base table#

A virtual table that directly maps to an object in the data source. Base table cannot be materialized directly, because physical data is stored at the data source.

column-level security#

Column-Level Security (CLS) enables access control to database table columns based on the user’s execution context or their group membership.

columnar database#

A columnar database stores data by columns rather than by rows, which makes it more suitable for analytical query processing, and thus for a data warehouse.

data anonymization#

Data anonymization has been defined as a “process by which personal data is irreversibly altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party.” Data anonymization enables the transfer of information across a boundary, such as between two departments within an agency or between two agencies, while reducing the risk of unintended disclosure, and in certain environments in a manner that enables evaluation and analytics post-anonymization.

data connection#

data source connection#

Data source connection is an object that holds all the necessary information like server name or url, user credentials or tokens, that allow for authentication to the data source, metadata extraction and querying of data.

data consumer#

data consumers#

A person or system that consumes data from Querona by connecting, issuing SQL queries and receiving results.

data definition language#

DDL#

DDL refers to the subset of SQL statements used to define and manage database structures. This includes creating, altering, and deleting objects such as tables, indexes, views, and schemas. Common DDL commands include CREATE, ALTER, DROP, and TRUNCATE. DDL statements modify the schema or structure of the database rather than the data itself.

data fabric#

Data fabric is an architecture that facilitates the end-to-end integration of various data pipelines and cloud environments through the use of intelligent and automated systems.

data governance#

A principled approach to managing data during its life cycle, from acquisition to use to disposal. It includes a set of rules, processes and technology to ensure data is secure, private, accurate, available, and usable.

data lake#

A data lake is a centralized storage system designed to ingest and store large volumes of raw data in its original form. It can handle structured, semi-structured, and unstructured data from diverse sources without requiring upfront transformation or schema enforcement. Data lakes support scalable storage and processing for big data analytics and real-time data exploration, enabling flexible data use for various analytical and machine learning workloads.

data lakehouse#

A data lakehouse is a data architecture that combines the capabilities of a data lake and a data warehouse. It merges the flexibility of storing large volumes of raw data in its original form with the structured data management and transactional features of data warehouses. This architecture uses low-cost storage to retain all data types while providing schema enforcement, ACID transactions, and data governance, enabling effective querying and analytics without duplicating data.

data manipulation language#

DML#

DML consists of SQL statements used to manipulate and query data within database objects. Typical DML commands are INSERT, UPDATE, DELETE, and SELECT. These statements enable the addition, modification, deletion, and retrieval of data stored in tables without altering the underlying schema.

data mart#

A smaller, specialized version of a data warehouse, focusing on a particular subject or department, serving specific needs or user groups.

data masking#

Data masking or data obfuscation is the process of hiding original data with modified content (characters or other data.) The main reason for applying masking to a data field is to protect data that is classified as personal identifiable data, personal sensitive data or commercially sensitive data, however the data must remain usable and look real and consistent.

data mesh#

A decentralized, domain-oriented approach to data platform architecture and organizational design, treating data as a product with individual teams responsible for their own domain-specific data products.

data pipeline#

A set of processes and tools used to ingest, process, and load data from one system to another, handling data integration, transformation, and processing tasks.

data provider#

Data provider is a library either built-in or custom, that provides necessary implementation that allows for low-level connectivity to a data source.

data pseudonymization#

Data pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms. A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing Pseudonymization can be one way to comply with the European Union’s new General Data Protection Regulation demands for secure data storage of personal information. Pseudonymized data can be restored to its original state with the addition of information which then allows individuals to be re-identified, while anonymized data can never be restored to its original state. The pseudonym allows tracking back of data to its origins, which distinguishes pseudonymization from data anonymization, where all person-related data that could allow backtracking has been purged.

data source#

Data source is any of the following types of sources for digitized data: a database, a file, a data stream, and others.

data virtualization#

Data virtualization uses a software abstraction layer to create a unified, integrated, fully usable view of data, without physically copying, transforming or loading the source data to a target system. The abstraction hides most of the technical aspects of how and where data resides, is stored and processed, allowing to access data irrespective to what interfaces and technologies are needed at the data sources. Data virtualization functionality enables organizations to create virtual data warehouses, data lakes and data marts without the cost and complexity of building and managing separate platforms for each. While data virtualization can be used alongside ETL, it can be an alternative to ETL and to other physical data integration methods, and is one of the technologies that enable data fabric architecture.

data warehouse#

A Data Warehouse (DW or DWH), is a centralized, consistent repository of structured and semi-structured data, collected from various sources within an organization, designed to support efficient data querying, reporting, analytics, data mining, artificial intelligence (AI) and machine learning.

dbms#

DBMS#

A Database Management System (DBMS) is software that enables the creation, management, and interaction with databases. In Querona, the DBMS is the underlying system on which the virtual database operates, influencing feature availability and behavior such as indexing and materialized view support.

dynamic data masking#

Dynamic data masking (DDM) is real-time data masking of data. DDM changes the data stream so that the data consumer does not get access to the sensitive data, while no physical changes to the original data take place.

Iceberg#

Apache Iceberg#

Apache Iceberg is an open source, high-performance table format designed for large analytic datasets stored in data lakes. It provides a structured metadata layer on top of data files, enabling features such as schema evolution, time travel (querying historical data), ACID transactions, and efficient data pruning. Iceberg allows multiple compute engines to query and modify the same dataset concurrently with transactional consistency.

integration virtual database#

A virtual database that supports data caching using one of the supported data processing systems - usually a DBMS or cloud service.

JDBC#

JDBC is an API for the Java programming language that defines how Java applications connect to and interact with databases. It provides a standard interface to execute SQL queries, retrieve results, and manage transactions across various relational databases. JDBC uses drivers as intermediaries to translate Java commands into database-specific protocols, enabling Java applications to access multiple heterogeneous databases in a consistent way.

materialized view#

MV#

A materialized view is a database object that stores the results of a query as a physical table within the virtual database. Unlike a regular view, which dynamically retrieves data from the underlying source tables at query time, a materialized view contains precomputed data that is periodically refreshed to stay synchronized with the source data. This precomputation enables faster query response times and optimized performance, especially for complex queries and aggregations.

Materialized views can include filters, joins, and aggregations, and indexes can be defined on any column to further improve query efficiency. The refresh strategy for materialized views can be full, incremental, or on demand, depending on the use case and data change frequency.

metadata#

A set of data that describes and gives information about other data.

There are three main types of metadata according to NISO definitions:

Descriptive metadata describes a resource for purposes such as discovery and identification. It can include elements such as title, abstract, author, and keywords.
Structural metadata indicates how compound objects are put together, for example, how pages are ordered to form chapters.
Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it.

ODBC#

Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS).

OLE DB#

OLE DB (Object Linking and Embedding, Database, sometimes written as OLEDB or OLE-DB), an API designed by Microsoft, allows accessing data from a variety of sources in a uniform manner. The API provides a set of interfaces implemented using the Component Object Model (COM); it is otherwise unrelated to OLE.

pass-through virtual database#

A virtual database that is uses the direct connection to a data source. All queries that utilize this VDB type are translated into the data access technology supported by the source, for example the SQL Dialect or API call, and execute on the data source without any caching.

row-level security#

Row-Level Security (RLS) enables you to use group membership or execution context to control access to rows in a database table. In Querona, RLS supports the filter predicates that silently filter the rows available to read operations.

Spark#

Apache Spark#

Apache Spark is an open-source, unified analytics engine designed for large-scale data processing across distributed computing clusters. It provides in-memory computation through resilient distributed datasets (RDDs), significantly reducing latency compared to disk-based systems.

SQL Client#

An SQL client is any software that accesses databases using the Tabular Data Stream (TDS) protocol and Transact-SQL (T-SQL) as the query language. It can be a low-level client library or API such as ADO.NET, JDBC, ODBC, or OLE DB, or a higher-level application like SQL Server Management Studio (SSMS), Power BI, or Tableau that relies on these lower-level clients to communicate with the database.

static data masking#

With Static Data Masking, the user configures how masking operates for each column selected inside the database. Static Data Masking will then replace data in the database copy with new, masked data generated according to that configuration. Original data cannot be unmasked from the masked copy.

TDS#

The Tabular Data Stream (TDS) Protocol is an application-level protocol used for the transfer of requests and responses between clients and database server systems. In such systems, the client will typically establish a long-lived connection with the server. TDS includes facilities for authentication and identification, channel encryption negotiation, issuing of SQL batches, stored procedure calls, returning data, and transaction manager requests. Returned data is self-describing and record-oriented. The data streams describe the names, types and optional descriptions of the rows being returned. For more information see MS-TDS.

TSQL#

T-SQL#

Transact-SQL (T-SQL) is Microsoft’s and Sybase’s proprietary extension to the SQL (Structured Query Language) used to interact with relational databases. T-SQL expands on the SQL standard to include procedural programming, local variables, various support functions for string processing, date processing, mathematics, etc. and changes to the DELETE and UPDATE statements.

All applications that communicate with an instance of SQL Server do so by sending Transact-SQL statements to the server, regardless of the user interface of the application.

In Querona, we refer to T-SQL to also indicate the specific SQL dialect that stands behind SQL Server and its emulation built-into Querona.

virtual database#

virtual databases#

VDB#

A virtual database acts as a container that allows transparent viewing and querying of multiple underlying databases through a single, uniform API. It integrates data from diverse sources as if they were one unified database, without physically copying or duplicating data. Each underlying database remains autonomous and fully functional on its own. When an application queries the virtual database, the system identifies which source databases hold the requested data and forwards the queries accordingly.

virtual table#

A virtual table is an object in metadata that holds information about the remote object: usually a table, view or a tabular result of a query to source system.