Overview of Query Retargeting#
The use of pre-computed aggregate data is a technique to address performance challenges in Data Warehouse systems (DW). An aggregate retargeting mechanism redirects a query to an available aggregate table(s) when possible.
Performance is a critical issue in DW systems. Pre-computed aggregation is the most effective strategy to reduce the time needed to access large numbers of rows.
Aggregates usage has to be transparent and non-intrusive. The following transparency aspects are addressed by Querona:
aggregate unawareness - data consumer is not aware that pre-aggregate was used during query execution
execution engine unawareness - data consumer is not aware what execution engine hosts pre-aggregate used
complexity unawareness - data consumer is not aware of pre-aggregate structure and how many pre-aggregates were used, if any
The adoption of aggregates should be completely transparent to DW users for two main reasons:
users typically do not have the necessary technical skills to make the appropriate choices on how to conveniently use aggregates to improve performance,
as decision requirements change constantly, aggregates may have to change accordingly (e.g. removal of obsolete aggregates).
An aggregate retargeting mechanism provides this transparency, by redirecting a query, if possible, to the available aggregate table(s) that can answer the query.
Objectives of Query Retargeting#
Objectives are as follows:
improve the performance of Data Warehouse and Data Marts
improve architecture flexibility
allow for a delay of optimization decisions until the performance requirements are fully known
allow for ad-hoc optimizations for quick reaction to performance issues in production
How Query Retargeting works#
An aggregate, also known as a summary table, is a materialized view that stores redundant, consolidated data. In Querona we prefer to use the term “target”, because a materialized view does not have to be an aggregate. In fact, any materialized view can be used as target: for example, a wide table can be reduced in a view (using projection) to a few columns that are needed most. Such target can then be used to substitute a wide table in queries issued by users.
The adoption of aggregates requires the resolution of the following:
finding and defining the best set of aggregates
efficiently maintaining the aggregates when new data is shipped to the data warehouse
efficiently computing these aggregate views from multiple sources
exploring the aggregates to answer queries.
Note
The current version of Querona leaves the finding, defining and maintaining aggregates to the user, usually a skilled data warehousing specialist. In essence, it is a user responsibility to define views that will be considered by the query rewrite engine as targets. For now, a semi-automated or fully-automated pre-aggregate candidate proposition is not supported.
Query retargeting in action#
To illustrate the problem at hand, let us consider the following example. Consider tables below:
FactSales
Prod_Id |
Month |
AmountSold |
---|---|---|
P1 |
200101 |
500 |
P2 |
200102 |
500 |
P1 |
200101 |
1000 |
P2 |
200102 |
3000 |
DimProducts
Prod_Id |
ProdName |
---|---|
P1 |
Product 1 |
P2 |
Product 2 |
Suppose that a critical query is to report monthly sales by product:
Month |
ProdName |
AmountSold |
---|---|---|
200101 |
Product 1 |
1500 |
200102 |
Product 2 |
3500 |
The query can be formulated like this:
1SELECT Month
2 , ProdName
3 , Sum(AmountSold) as MonthlySales
4 FROM FactSales
5 JOIN DimProducts
6 ON FactSales.Prod_id = DimProducts.Prod_id
7 GROUP
8 BY Month, ProdName
Assume that the FactSales table is very large and expected query response time is below one second.
To use query retargeting we have to:
define a view - target view name or virtual database it belongs to do not matter
define caching of a view - either caching mode will do
mark view as ‘Query rewrite target’
materialize the view by rebuilding the cache
For example, we have to define the following view:
1CREATE VIEW v_summary_by_month
2AS
3 SELECT Month
4 , ProdName
5 , Sum(AmountSold) as MonthlySales
6 FROM FactSales
7 JOIN DimProducts
8 ON FactSales.Prod_id = DimProducts.Prod_id
9 GROUP
10 BY Month, ProdName
To make it work we also need to enable the caching of the view v_summary_by_month
and rebuild the cache (materialize the view).
When a user executes the query:
1SELECT Month, ProdName, Sum(AmountSold) as Sales
2 FROM FactSales
3 JOIN DimProducts
4 ON FactSales.Prod_id = DimProducts.Prod_id
5 GROUP
6 BY Month, ProdName
Querona will rewrite the query and retarget it to a materialized view with ready-made pre-aggregate. The resulting query would be similar to:
1SELECT Month, ProdName, MonthlySales as Sales
2 FROM v_summary_by_month
Response time will be much better thanks to utilization of the already prepared results.
More examples of supported variants of the query:
1SELECT Month, ProdName, Sum(AmountSold) as Sales
2 FROM FactSales
3 JOIN DimProducts
4 ON FactSales.Prod_id = DimProducts.Prod_id
5 WHERE Month BETWEEN (200101 and 200102)
6 GROUP
7 BY Month, ProdName
1SELECT Sum(AmountSold)
2 FROM FactSales
3 JOIN DimProducts
4 ON FactSales.Prod_id = DimProducts.Prod_id
5 WHERE Month >= 200101