Overview of Query Retargeting#

The use of pre-computed aggregate data is a technique to address performance challenges in Data Warehouse systems (DW). An aggregate retargeting mechanism redirects a query to an available aggregate table(s) when possible.

Performance is a critical issue in DW systems. Pre-computed aggregation is the most effective strategy to reduce the time needed to access large numbers of rows.

Aggregates usage has to be transparent and non-intrusive. The following transparency aspects are addressed by Querona:

  • aggregate unawareness - data consumer is not aware that pre-aggregate was used during query execution

  • execution engine unawareness - data consumer is not aware what execution engine hosts pre-aggregate used

  • complexity unawareness - data consumer is not aware of pre-aggregate structure and how many pre-aggregates were used, if any

The adoption of aggregates should be completely transparent to DW users for two main reasons:

  • users typically do not have the necessary technical skills to make the appropriate choices on how to conveniently use aggregates to improve performance,

  • as decision requirements change constantly, aggregates may have to change accordingly (e.g. removal of obsolete aggregates).

An aggregate retargeting mechanism provides this transparency, by redirecting a query, if possible, to the available aggregate table(s) that can answer the query.

Objectives of Query Retargeting#

Objectives are as follows:

  • improve the performance of Data Warehouse and Data Marts

  • improve architecture flexibility

  • allow for a delay of optimization decisions until the performance requirements are fully known

  • allow for ad-hoc optimizations for quick reaction to performance issues in production

How Query Retargeting works#

An aggregate, also known as a summary table, is a materialized view that stores redundant, consolidated data. In Querona we prefer to use the term “target”, because a materialized view does not have to be an aggregate. In fact, any materialized view can be used as target: for example, a wide table can be reduced in a view (using projection) to a few columns that are needed most. Such target can then be used to substitute a wide table in queries issued by users.

The adoption of aggregates requires the resolution of the following:

  • finding and defining the best set of aggregates

  • efficiently maintaining the aggregates when new data is shipped to the data warehouse

  • efficiently computing these aggregate views from multiple sources

  • exploring the aggregates to answer queries.

Note

The current version of Querona leaves the finding, defining and maintaining aggregates to the user, usually a skilled data warehousing specialist. In essence, it is a user responsibility to define views that will be considered by the query rewrite engine as targets. For now, a semi-automated or fully-automated pre-aggregate candidate proposition is not supported.

Query retargeting in action#

To illustrate the problem at hand, let us consider the following example. Consider tables below:

FactSales

Prod_Id

Month

AmountSold

P1

200101

500

P2

200102

500

P1

200101

1000

P2

200102

3000

DimProducts

Prod_Id

ProdName

P1

Product 1

P2

Product 2

Suppose that a critical query is to report monthly sales by product:

Month

ProdName

AmountSold

200101

Product 1

1500

200102

Product 2

3500

The query can be formulated like this:

1SELECT Month
2     , ProdName
3     , Sum(AmountSold) as MonthlySales
4  FROM FactSales
5  JOIN DimProducts
6    ON FactSales.Prod_id = DimProducts.Prod_id
7 GROUP
8    BY Month, ProdName

Assume that the FactSales table is very large and expected query response time is below one second.

To use query retargeting we have to:

  • define a view - target view name or virtual database it belongs to do not matter

  • define caching of a view - either caching mode will do

  • mark view as ‘Query rewrite target’

  • materialize the view by rebuilding the cache

For example, we have to define the following view:

 1CREATE VIEW v_summary_by_month
 2AS
 3    SELECT Month
 4         , ProdName
 5         , Sum(AmountSold) as MonthlySales
 6      FROM FactSales
 7      JOIN DimProducts
 8        ON FactSales.Prod_id = DimProducts.Prod_id
 9     GROUP
10        BY Month, ProdName

To make it work we also need to enable the caching of the view v_summary_by_month and rebuild the cache (materialize the view).

When a user executes the query:

1SELECT Month, ProdName, Sum(AmountSold) as Sales
2  FROM FactSales
3  JOIN DimProducts
4    ON FactSales.Prod_id = DimProducts.Prod_id
5 GROUP
6    BY Month, ProdName

Querona will rewrite the query and retarget it to a materialized view with ready-made pre-aggregate. The resulting query would be similar to:

1SELECT Month, ProdName, MonthlySales as Sales
2  FROM v_summary_by_month

Response time will be much better thanks to utilization of the already prepared results.

More examples of supported variants of the query:

1SELECT Month, ProdName, Sum(AmountSold) as Sales
2  FROM FactSales
3  JOIN DimProducts
4    ON FactSales.Prod_id = DimProducts.Prod_id
5 WHERE Month BETWEEN (200101 and 200102)
6 GROUP
7    BY Month, ProdName
1SELECT Sum(AmountSold)
2  FROM FactSales
3  JOIN DimProducts
4    ON FactSales.Prod_id = DimProducts.Prod_id
5 WHERE Month >= 200101