Overview of Query Retargeting

The use of pre-computed aggregate data is a technique to address performance challenges in Data Warehouse systems (DW). An aggregate retargeting mechanism redirects a query to an available aggregate table(s) when possible.

Performance is a critical issue in DW systems. Pre-computed aggregation is the most effective strategy to reduce the time needed to access large numbers of rows.

Aggregates usage has to be transparent and non-intrusive. The following transparency aspects are addressed by Querona:

  • aggregate unawareness - data consumer is not aware that pre-aggregate was used during query execution
  • execution engine unawareness - data consumer is not aware what execution engine hosts pre-aggregate used
  • complexity unawareness - data consumer is not aware of pre-aggregate structure and how many pre-aggregates were used, if any

The adoption of aggregates should be completely transparent to DW users for two main reasons:

  • users typically do not have the necessary technical skills to make the appropriate choices on how to conveniently use aggregates to improve performance,
  • as decision requirements change constantly, aggregates may have to change accordingly (e.g. removal of obsolete aggregates).

An aggregate retargeting mechanism provides this transparency, by redirecting a query, if possible, to the available aggregate table(s) that can answer the query.

Objectives of Query Retargeting

Objectives are as follows:

  • improve the performance of Data Warehouse and Data Marts
  • improve architecture flexibility
  • allow for a delay of optimization decisions until the performance requirements are fully known
  • allow for ad-hoc optimizations for quick reaction to performance issues in production

How Query Retargeting works

An aggregate, also known as a summary table, is a materialized view that stores redundant, consolidated data. In Querona we prefer to use the term “target”, because a materialized view does not have to be an aggregate. In fact, any materialized view can be used as target: for example, a wide table can be reduced in a view (using projection) to a few columns that are needed most. Such target can then be used to substitute a wide table in queries issued by users.

The adoption of aggregates requires the resolution of the following:

  • finding and defining the best set of aggregates
  • efficiently maintaining the aggregates when new data is shipped to the data warehouse
  • efficiently computing these aggregate views from multiple sources
  • exploring the aggregates to answer queries.

Note

The current version of Querona leaves the finding, defining and maintaining aggregates to the user, usually a skilled data warehousing specialist. In essence, it is a user responsibility to define views that will be considered by the query rewrite engine as targets. For now, a semi-automated or fully-automated pre-aggregate candidate proposition is not supported.

Query retargeting in action

To illustrate the problem at hand, let us consider the following example. Consider tables below:

FactSales

Prod_Id Month AmountSold
P1 200101 500
P2 200102 500
P1 200101 1000
P2 200102 3000

DimProducts

Prod_Id ProdName
P1 Product 1
P2 Product 2

Suppose that a critical query is to report monthly sales by product:

Month ProdName AmountSold
200101 Product 1 1500
200102 Product 2 3500

The query can be formulated like this:

1
2
3
4
5
6
7
8
SELECT Month
     , ProdName
     , Sum(AmountSold) as MonthlySales
  FROM FactSales
  JOIN DimProducts
    ON FactSales.Prod_id = DimProducts.Prod_id
 GROUP
    BY Month, ProdName

Assume that the FactSales table is very large and expected query response time is below one second.

To use query retargeting we have to:

  • define a view - target view name or virtual database it belongs to do not matter
  • define caching of a view - either caching mode will do
  • mark view as ‘Query rewrite target’
  • materialize the view by rebuilding the cache

For example, we have to define the following view:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE VIEW v_summary_by_month
AS
    SELECT Month
         , ProdName
         , Sum(AmountSold) as MonthlySales
      FROM FactSales
      JOIN DimProducts
        ON FactSales.Prod_id = DimProducts.Prod_id
     GROUP
        BY Month, ProdName

To make it work we also need to enable the caching of the view v_summary_by_month and rebuild the cache (materialize the view).

When a user executes the query:

1
2
3
4
5
6
SELECT Month, ProdName, Sum(AmountSold) as Sales
  FROM FactSales
  JOIN DimProducts
    ON FactSales.Prod_id = DimProducts.Prod_id
 GROUP
    BY Month, ProdName

Querona will rewrite the query and retarget it to a materialized view with ready-made pre-aggregate. The resulting query would be similar to:

1
2
SELECT Month, ProdName, MonthlySales as Sales
  FROM v_summary_by_month

Response time will be much better thanks to utilization of the already prepared results.

More examples of supported variants of the query:

1
2
3
4
5
6
7
SELECT Month, ProdName, Sum(AmountSold) as Sales
  FROM FactSales
  JOIN DimProducts
    ON FactSales.Prod_id = DimProducts.Prod_id
 WHERE Month BETWEEN (200101 and 200102)
 GROUP
    BY Month, ProdName
1
2
3
4
5
SELECT Sum(AmountSold)
  FROM FactSales
  JOIN DimProducts
    ON FactSales.Prod_id = DimProducts.Prod_id
 WHERE Month >= 200101